Вы находитесь на странице: 1из 12

Production Support/Application Testing/Software Defect and IBM Mainframe COBOL A BEND Research When an application ABEND (ABnormal END-of-job)

occurs, Z/OS stops executing you r program, closes files and buffers and generates a single high-level message in the form of a System Completion Code (Sxxx). The System Completion Code is usu ally written to an output listing file through your //SYSOUT DD * JCL entry. Th is completion code indicates why the system has decided to stop executing your a pplication. It is related to, but often only loosely related to what is really wrong with your application. Because of this the System Completion Code repre sents only the starting point for your analysis of the problem. Other Debugging Assistance Along with the System Completion Code, use IBMs Problem Determination tools (PD T ools) - this will generate a listing (SYSOUT) which describes: The System Completion Code (and often a short text description of what i t designates) A short explanation of the cause of the ABEND The COBOL instruction (statement) or line number, which contained the in valid operation causing Z/OS to halt execution A "core-dump" (a hexadecimal printout) of the internal machine storage a nd registers relevant to the areas of your program surrounding the COBOL instruc tion which caused Z/OS to halt execution. This information is useful to begin understanding and researching the problem, b ut it is usually far from sufficient to solve the problem, which could be any co mbination of: Incomplete, incorrect or invalid COBOL procedural logic A typo such as a misplaced period, or incorrectly specified field Incorrect or invalid input data Batch jobs run out of sequence Input files missing or corrupted (hardware errors) Errors which relate to JCL problems etc. There are as many different ways to analyze and research COBOL ABENDs as there a re individual approaches to writing procedural logic. However, if you ve never done this type of "logic-detective" work on a large scale, and to help you get s tarted with this complex and crucial process, consider the following approach of five steps: Preparation Research Hypothesis Solution Resolution As a final note before beginning, understand that there are really two distinct phases of Production Support: 1. Data Center on-call ABEND resolution - wherein a technician receives notificati on that a job or transaction has ABENDd and must be "fixed" within an extremely s hort timeframe (usually minutes to hours). In this case, the technician s main concern is to "patch" the problem - get the system back online, or get the batch jobstream back into production ("Patch-It"). 2. NextDay problem resolution - wherein technician(s) actually track down and so lve the problem that caused the ABEND ("Fix-It"). The steps below represent a process for "FixIt" - they go well beyond the scope of the emergency measures used to "patch" the problem during an OnCall emergency .

1. Preparation - Collect all necessary background information (WHAT happe ned and WHERE the ABEND occurred) Print out the ABEND information Collect all supporting ABEND output (SYSOUT) from the job - (ABEND-AID, DISPLAY statements, etc.) Obtain copies of the run-time: JCL Program source -and all copybooks (or expanded source listing) From the JCL learn the dataset names of input and output files accessed by the program (which you may need to browse as part of your research) Learn the nature of the batch job from system documentation , or from an application business expert (at least at the level of module-flow and file-acce ss) 2. Research - Construct a mental map (understanding) of the program s exe cution (HOW the ABEND occurred) To make the correct WHY determination usually requires a combination of "Static" and "Dynamic" analysis - complementary research and investigative appro aches. Note: These steps need not be followed in this order. Rather, in time you will develop an "intuition" as to which kind(s) of analysis will be most lik ely to provide the information you need to solve your problem. In a production support role Static Analysis:

1. Structural Visualization: is the generation of an accurate mental map, unders tanding or mental image of the program s control structure, or logic-architectur e. Using the starting point represented by the ABEND condition (the statement w hich caused Z/OS to halt execution) and using electronic-assisted tools (such as IBMs Rational Asset Analyzer or Rational Developer for System z), build an accur ate understanding of the code invocation at: The module/file level (System View) Paragraph/Section level (Hierarchy chart) (if necessary i.e. if the code is dense or complex) Statement level (Flo w chart) Structural Visualization can done be "top-down", by asking open-ended questions; such as learning how a particular routine "hangs-together logically", or it can be used "bottom-up", by asking specific close-ended questions about a program, such as "How does this particular paragraph get executed?" "How did this module get invoked?" 2. Data Flow Analysis: A combination of control structure analysis and data item analysis, which seeks to determine the usage of particular fields throughout a program. Data flow analysis is used to determine (from a given instance of a da ta item) where the next occurrence(s) of that item exist in your program, and ho w the data item is used; (as a receiving field in a MOVE or mathematical operati on, as the sending field in a MOVE statement, as part of a logic-branch (IF, PER FORM UNTIL/VARYING, etc.). 3. Data Impact Analysis: An expansion of Data Flow Analysis which traces the mov ement of data from field-to-field throughout a program, or throughout an entire application; including I/O (screens and files). Using Data Impact Analysis, you can identify all fields that might have had an impact on the contents of a fiel d (before the ABEND occurred). And just as importantly - you can learn the affe ct changing this field will have on the behavior of the application. 4. Textual or Data Item Usage: Utilized more for application maintenance and enh ancement requests, this type of Static Analysis involves searching for "categori es" of program-items, such as "List all fields that contain *JUL*, *GREG*, *YR*

, *YEAR* (suspect date candidates for Year2000 conversion), or list all such fie lds with two digits (numeric) or two-byte (alphanumeric) definitions. 5. Code Partitioning: Again, utilized more for application maintenance, enhancem ents and application reengineering, Code Partitioning involves mentally organizi ng and analyzing code by function or process, such that you understand and can d istinguish the usage of code by business process. For example: Find all code th at relates to the calculation of premium renewal payments or Isolate the code tha t edits a particular file, with an eye towards creating a shared subroutine from the code. Dynamic Analysis:

1. Tracing: Source-level interactive debugging. Watch the program execute state ment-by-statement, and line-by-line. This is very useful for detailed-debugging , particularly of dense or complex instructions. Some software (for example, th e Rational Developer for System z) allows you to trace the program logic, attemp ting to re-create the sequence of events (COBOL statements) that transpired up t o and including the ABEND condition. Tracing is an invaluable method for detail ed debugging. However, given the size and scope of production applications, it is generally more practical to Trace specific problem areas of a program. 2. Interactive Execution: Execute (run) a program, stopping at selective Breakpo ints (Pause execution each time a certain field-value changes, or when a value e xceeds some threshold), and examining the contents (value) of specific fields. Interactive Execution must be done by (or with) an application analyst who under stands how the system is supposed to operate. Interactive Execution is useful f or observing control flow, and is often combined with line-by-line tracing by se tting selective breakpoints, monitoring values, "running" the application to the breakpoints, and then tracing the code line-by-line. 3. Selective Data State Collection: Execute code and establish a functional summ ary of specific data states that it creates. Use these states in subsequent tes t runs to compare results of current values to expected values. 4. Coverage: Analyze the number of times each COBOL statement is executed for a given run. This technique is extremely useful for analyzing test data coverage of a given application. And it can be used effectively for debugging if it make s apparent problems such as infinite loops (S222, S322 and B37 ABENDs), over-loa ding tables - (loading tables beyond the maximum OCCURS clause and overlaying st orage, which can cause S0C1, S0C4, and S0C7 ABENDs). Using a COBOL research and analysis tool (such as IBMs Rational Asset Ana lyzer or Rational Developer for System z), or some other source-level analysis s oftware) perform Static and/or Dynamic Analysis on the specific areas of the app lication relating to the ABEND, to determine (based on WHERE the problem manifes ted itself to the system - obtained from the ABEND-AID listing of which statemen t caused the ABEND ) HOW this particular problem occurred in the application. 3. Hypothesis - Determine WHY the ABEND occurred With the research in steps 1 and 2, you should be able to describe WHAT, WHERE and HOW the ABEND occurred (at what point in the program the logic failed , and what sequence of COBOL statements caused the failure). However, before modifying any logic, you must determine WHY these statem ents (or sequence of events) caused this particular failure (e.g. "Why did this production input file contain spaces in a numeric field?" "Why did the program s logic perform the Initialization routine twice?" "Why did the Read routine ex ecute past end-of-file?", etc.). Only through a determination of WHY will you be able to make a change to production business logic safely, and with confidence that;

Your change will resolve the ABEND Your change will not introduce new (additional) ABENDs Sometimes it is relatively easy to come to an understanding of WHY certa in ABEND conditions occurred. For example, perhaps a period was left off the ap propriate termination point for an IF statement - which caused execution to perf orm an operation out of sequence. Or perhaps an IF .. NUMERIC test (which shou ld have been coded for all numeric fields in a file) was forgotten. Or a paragr aph was performed through the wrong paragraph-exit, or a production job was rele ased before certain files were available (causing I/O errors). These types of A BEND situations can be understood (and usually resolved) fairly quickly. Howeve r, this is not always the case. What if - in the case of the IF statement with the incorrect termination point - the logic that has been coded, correctly processed the first 100,000 re cords in the file? Making a change to a critical IF condition could very well a ffect other down-stream processing within the program, wrecking havoc with subse quent routines. Or what if - in the case of the file containing blanks in the n umeric fields - the input file was supposed to be "clean" (validated) by this po int in the jobstream - having gone through allegedly "exhaustive" edits in prior modules. By simply adding an IF test you may solve your program s specific ABE ND, but you will not have resolved the actual problem - which exists somewhere e lse in the system. In other words, provincial approaches to resolving productio n ABENDs are not recommended - as they usually change the problem, instead of s olving it. It should be noted that, a clear understanding of the business functiona lity automated by this process is usually required to completely resolve WHY som ething has gone wrong. Calling on business experts or "application/business" e xperts who understand "the big picture" - and the context in which the job execu tes is the rule rather than the exception to this process. Developing a clear and accurate determination of WHY a problem that lead to an ABEND condition exists may take a considerable amount of time, depending on the: Size, complexity and structure of the code Your familiarity with the program s business purpose - coupled with your ability to grasp the point of each statement (assuming you didn t write the cod e) Type of ABEND and reason for the problem (some are more diabolical than others) Size of the input/output files, and capabilities of your file editor Note that, in addition to an understanding of the reason for the ABEND, the results of your investigation should produce an understanding of the solutio n to the problem (the fix itself). 4. Solution - Fix the problem and test your solution Take the appropriate action to resolve any business - or system-wide iss ues. Depending on how extensive the damage caused by the problem, or for how lo ng any problems have persisted undetected: Files may have to be restored from backups from a previous point-in-time Jobs may have to be re-run from a previous point-in-time (synchronized w ith file generations) Files may have to be modified with "one-shot" programs, written to resol ve issues that require "surgery" on the data Take the appropriate action to fix the technical (coding) problem Edit program source - modifying the existing production logic and/or Modify the JCL (if the error included JCL issues) Test your solution Compile and Link the new version of the application Create an "image copy" of the production file system, in order to test y our fix Re-Run the batch job and analyze results Run "Regression Tests" against the new code - analyze for unexpected res

ults 5. Resolution Build and migrate back in to production Promote your changes into production Schedule and re-run the cycle

Appendix - ABEND Completion Codes and some typical causes While there is a wide variety of reasons for ABEND conditions ("WHYs") in produc tion systems, it is possible (and useful) to categorize and organize HOW certai n conditions often lead to certain types of ABEND completion codes - in order t o expedite or streamline your analysis and research (an 80/20 approach to analys is). The following information on a few common Z/OS ABEND completion codes, and the conditions which generated them is included for you to make effective use o f ABEND-AID listings and the above debugging, research and analysis process. S0C1 of-range

Attempt to execute an invalid machine instruction S0C1s occur due to COBOL: Table-handling overlay (MOVEs to table subscripts/indexes which are out- and which overwrite PROCEDURE DIVISION instructions) Statements referencing LINKAGE-SECTION fields incorrectly CALLs to an invalid subroutine name

The COBOL compiler always generates valid machine instructions. S0C1 s usually occur when populating tables beyond the valid OCCURS range Typical Reasons for S0C1s Explanation Moving elements to a table using a subscript or index This usually happens because of a loop that which contains a value beyond is not termin ated correctly - such as a routine which the maximum OCCURS in the table declaration populates a table from an input file containing more records than the table OCCURS declaration provides for. It can also happen through a MOVE or invalid math statement which computes an invalid subscript/index value. Referencing incorrectly defined/passed s of your LINKAGE SECTION LINKAGE SECTION fields atch, or the definitions in the called If the definition fields do not m

program are larger than the calling program, you could be attempting to reference data outside of valid storage when statements which reference those fields execute CALL to an invalid or unavailable module-name If your program makes a dyn amic CALL and the module-name being called is not found, you can get S806, S0C4 or S0C1 system errors. The reasons for invalid module-names include; misspelling the name,

incorrectly specifying the STEPLIB/JOBLIB DSN= in the JCL (or incorrectly concat enating the STEPLIB/JOBLIB datasets), leaving out apostrophes (or quotes) on a C ALL literal - which would cause the COBOL compiler to treat the statement as if it were a CALL identifier - and if an identifier with that name exists in the Da ta Division, COBOL will attempt a dynamic CALL to the value of the identifier.

S0C4

Attempt to reference an invalid storage address

S0C4s occur due to COBOL: Table-handling overlay errors (MOVEs to table subscripts/indexes which a re out-of-range - and which overwrite PROCEDURE DIVISION instructions) Statements referencing LINKAGE SECTION fields incorrectly CALLs to an invalid subroutine name STOP RUN or GOBACK in the INPUT or OUTPUT PROCEDURE when using the COBOL SORT verb Attempt to access an unopened dataset Unless your program is executing with "bounds-checking" (supported by CA -Capex Optimizing, COBOL II and COBOL/370 - and generally not used in production ), your table routines could overlay the contents of storage beyond the boundary of the OCCURS clause. This can cause S0C7s (see above) S0C1s and S0C4s by over writing field values in the Data Division (S0C7s) or actually overwriting the in structions in your PROCEDURE DIVISION, producing invalid addresses (operands) fo r the executable (machine) code (which in turn can cause S0C1s and S0C4s) Typical Reasons for S0C4s Explanation Table subscript or index contains a zero value handling subscript/index references are within the allowable range of of the table s OCCURS clause (>= 1, <= OCCURS max).

Verify that all table-

Moving elements to a table using a subscript or index This usually happens because of a loop that which contains a value beyond is not termin ated correctly - such as a routine which the maximum OCCURS in the table declaration populates a table from an input file containing more records than the table OCCURS declaration provides for. It can also happen through a MOVE or invalid math statement which computes an invalid subscript/index value. Referencing incorrectly defined/passed s of your LINKAGE SECTION LINKAGE SECTION fields atch, or the definitions in the called If the definition fields do not m

program are larger than the calling program, you could be attempting to reference data outside of valid storage when statements which reference those fields execute

CALL to an invalid or unavailable module-name If your program makes a dyn amic CALL and the module-name being called is not found, you can get S806, S0C4 or S0C1 system errors. The reasons for invalid module-names include; misspelling the name, incorrectly specifying the STEPLIB/JOBLIB DSN= in the JCL (or incorrectly concat enating the STEPLIB/JOBLIB datasets), leaving out apostrophes (or quotes) on a C ALL literal - which would cause the COBOL compiler to treat the statement as if it were a CALL identifier - and if an identifier with that name exists in the Da ta Division, COBOL will attempt a dynamic CALL to the value of the identifier. S0C7 Data exception (invalid numeric data in numeric field - caught by a Conv ert-to-Binary machine instruction during a mathematical operation or numeric com pare) following COMP) and S0C7s can occur on COBOL: Arithmetic instructions: ADD, SUBTRACT, MULTIPLY, DIVIDE, COMPUTE Comparisons involving tests of numeric fields (which can occur with the statements): IF, EVALUATE, PERFORM UNTIL, PERFORM VARYING, GO TO DEPENDING MOVE statements when the receiving field is packed (COMP-3) or binary ( the sending field contains invalid numeric data

S0C7s occur when Z/OS finds invalid numeric data in a field defined as P IC 9 (all PIC 9 fields - DISPLAY, COMP, COMP-3 and floating point) during arithm etic or compare operations Note: S0C7s do not occur on an IF statement comparing PIC X fields

Typical Reasons for S0C7s Explanation Failure to initialize a WORKING-STORAGE field contain

Be sure all numeric work areas

a VALUE clause at the elementary level or are correctly INITIALIZEd before they are used (as other than receiving fields in MOVE statements) within your program. Be particularly careful with "counters and accumulators". Also, always initialize elementary (rather than group) COMP-3 fields. Non-numeric "input data" in a numeric field eed "IF NUMERIC " test - or may need to browse output files produced in previous job step ("input" to this program) Fall-through logic, or invalid branching sequence Sometimes program logic err May n

ors force program execution into a paragraph out of sequence (such as executing an edit routine before a record is READ, or after the file has been closed (and spaces or HIGH-VALUES have been moved to the record). MOVE statements when the receiving field s tement generated by COBOL compiler definition is COMP or COMP-3 e datatype definition of the receiving field. If the The type of MOVE sta is based on th

the receiving field is COMP or COMP-3, COBOL generates an "algebraic MOVE". This will result in a S0C7 if the sending field contains invalid numeric data. "IF NUMERIC " tests on the sending field may be necessary prior to the MOVE statement. Table-loading overlay errors happen if a table-loading process overlays data beyond the table OCCURS range (i.e. non-numeric data can be moved to numeric-defined fields that are adjacent to the storage area set aside for the table through its OCCURS clause) Referencing incorrectly defined/passed s of your LINKAGE SECTION LINKAGE SECTION fields ot match, your program may reference non-numeric data through numeric field definitions S0CB If the definition fields do n This can

Attempt to divide by zero/decimal-divide overflow

S0CBs occur due to COBOL: DIVIDE statements if the quotient in a division using a decimal operand is greater than the size of the receiving field Division by zero Note that S0CB ABENDs may be intercepted by COBOL library subroutines (w hich automatically check for zero before dividing). If this is the case zero-di vide will result in "user" return-codes: U0203 - OSVS COBOL U1061 - VS COBOL II Typical Reasons for S0CBs Explanation DIVIDE by zero Program logic should always check to see if the divisor has been properly initialized or updated.

Or in the case of input edits and data validation, that the divisor is > zero before doing the division. Also check to see whether a fractional value was MOVEd to an integer field, truncating the fractional value and resulting in zero divide. Decimal DIVIDE exception pecification of the COMP-3 receiving field, the placement of the V in the receiving field definition (and the overall definition of the receiving field). Also, check to see if the ON SIZE ERROR condition should have been coded. S001 Check the s

Input/Output problem S001s occur due to COBOL logic errors File READ/WRITE error File OPEN/CLOSE error

S001 errors occur primarily due to incorrect COBOL logic (fall-thru erro rs, logic executed out of sequence, etc.) Typical Reasons for S001s Explanation S001on a READ operation Occurs if your program READs before opening a file or READs after closing a file (Place file OPEN/CLOSE statements in dedicated Initialization and Termination paragraphs.) Can also occur if your program READs past the end-of-file condition (create a unique end-of-file switch for each file your program reads, watch "switch" on READ statement and PERFORM UNTIL.) Can also occur if your program attempts to READ from a file OPEN for OUTPUT. S001on a WRITE operation ou WRITE before opening a file or after closing a file (see above on Initialization/Termination routines). Occurs if y

Can also occur if your program attempts to WRITE to a file OPEN for INPUT. S013

Conflict in DCB (Data Control Block) parameters

S013s occur due to inconsistencies between COBOL file description statem ents in your program, and: The DCB (data control block) parameter specified on the file DD statemen t in your JCL (for output files) or The DCB entry taken from the physical file DCB parameters, stored on the file s device header. Typical Reasons for S013s Explanation S013 on an OPEN statement for an input file if your program s RECORD CONTAINS clause conflicts with the physical file s record length. Or if your program s BLOCK CONTAINS clause conflicts with the physical file s blocking factor. Suggestion - on input files, do not specify RECORD CONTAINS. Code BLOCK CONTAINS 0 RECORDS. S013 on an OPEN statement for an output file s RECORD CONTAINS clause conflicts with the file s JCL (LRECL= size). Or if your program s BLOCK CONTAINS clause conflicts with the file s JCL BLKSIZE= parameter. Suggestion - on output files, code BLOCK CONTAINS 0 RECORDS. S213

File open error S213s occur when an input file is not found. This can happen if: The file does not exists or The filename is misspelled on the JCL DSN= parameter

Typical Reasons for S213s Explanation S213 on an OPEN statement for an input file on file OPEN when the system cannot find the input filename as specified in your JCL. This can happen because of a simple typo in the JCL, or because a previous job failed to complete successfully. S122/222/322 Operator cancel

Occurs

S122/S222s occur when an operator cancels a job S122 means the job was canceled and a storage dump was requested S222 means the job was canceled, but a dump was not requested (although, depending on which Z/OS routine was active when the job was canceled a dump may

Occurs

Occurs if your program

have been produced) S322s occur when Z/OS cancels a job because the default or specified CPU time limit for a job step or procedure was exceeded

(Note on S122/222) It is important to note that S122/222 job cancellatio ns are "judgment calls" by the system operator, and that in fact, there may be n othing wrong at all. Always begin your research by calling the operator and req uesting an explanation of why they canceled the job. (Note on S322) If a job that normally processes 100,000 records jumps to 10,000,000, or if it is run on a slower CPU with slower external devices S322 m ay simply signify that you have to increase the CPU time in the JCL However, it could be that S122/222/322s occur because of program logic o r job execution errors: Typical Reasons for S122/222/322s Explanation Job is deadlocked (program is in a Wait state) n a file your program has requests cannot be allocated to your process, because some other program is using it. This generally occurs when jobs are initiated out of sequence. Program is in an infinite loop hen a file your logic repeatedly executes the same routines over and over. Generally due to incorrectly setting or checking switches and return-codes, or some type of fall-through error. S806

Occurs whe

Occurs w

Requested Load Module not found

S806s occur when a called program (or system subroutine) is not found. This can happen if: The module name is misspelled on the CALL statement The module was not successfully LINKed into the application The program name is misspelled on the JCL EXEC PGM= parameter The STEPLIB/JOBLIB DD statements point to incorrect load libraries, or the libraries are incorrectly concatenated Typical Reasons for S806s Explanation Module name is misspelled If your program makes a dynamic CALL and the module-name being called is not found, you can get S806, S0C4 or S0C1 system errors. The reasons for invalid module-names include; misspelling the name, incorrectly specifying the STEPLIB/JOBLIB DSN= in the JCL (or incorrectly concat enating the STEPLIB/JOBLIB datasets), leaving out apostrophes (or quotes) on a C ALL literal - which would cause the COBOL compiler to treat the statement as if it were a CALL identifier - and if an identifier with that name exists in the Da ta Division, COBOL will attempt a dynamic CALL to the value of the identifier. B37/E37

Out of space condition

B37/E37s occur when there is insufficient space on an output device. Th is can occur because of: Insufficient SPACE allocated through the JCL for an output file - in whi

ch case you should re-estimate the SPACE requirements for your output file, and increase SPACE allocation Insufficient SPACE on a particular DASD device - in which case you shoul d either choose a different device, or remove some files from the pack. A program logic error such as an infinite loop which includes WRITE stat ements Typical Reasons for B37/E37s Explanation Program is in an infinite loop in a WRITE routine Occurs when a file your logic repeatedly executes a WRITE statement over and over. Generally due to incorrectly setting or checking switches and return-codes, or some type of fall-through error.

Вам также может понравиться