Function List - COBOL

Syntax highlighting, code folding, brace matching, code indenting, and function list

Function List - COBOL

Postby rmbunton » Wed Aug 03, 2011 11:17 am

I read earlier a dicussion thread on COBOL function lists in UltraEdit. I play around with the regex from that discussion. My goal is to have a function list for COBOL that lists all paragraphs in the procedure division. I do not have a problem with the "exit" paragraphs showing up. I started out with the regex that came with the COBOL wordfile that I downloaded form the IDM web pages. Which is the first line of my current regex for the function list. Below is my current function list regex:

%[ ^t]+^([a-z^-]+^) ^{division^} ^{section^}
%[ ^t]++^([0-9a-z-_]++[~end-if .]^).^
% 0000-^main

I left the first regex line in because I like to have the COBOL sections and divisions for a point of reference. The second regex line seems to get most of the COBOL paragraphs to show up. For your information I used "[~end-if .]" to get rid of a bunch of "end-if" 's showing up in my list. The third regex line is to get the "0000-main" paragraph to show up in the list. Below is a code snippet from the COBOL program, which only shows the paragraphs.

0000-main. <----- doesn't find with just the second regex line. why??
0010-server-init.
0020-assign-input-fields.
0030-assign-output-fields.
0040-standardize-sp-params.
0050-validate-logon. <----- doesn't find with all 3 regex lines. why??
0055-get-users-hipaa-unit.
0060-process-request.
0070-LOG-ACTIONS.

So, I guess I have two questions. One, why does the second line of my regex not find paragraph "0000-main."? Two, why does paragraph "0050-validate-logon." not show up in my list?
rmbunton
Newbie
 
Posts: 2
Joined: Wed Aug 03, 2011 10:41 am

Re: Function List - COBOL

Postby Mofi » Wed Aug 03, 2011 12:31 pm

I explain the 3 UltraEdit regular expression you used here and what are the problems.

1) %[ ^t]+^([a-z^-]+^) ^{division^} ^{section^}

% ... start the search at beginning of a line.

[ ^t]+ ... find 1 or more spaces/tabs at beginning of a line. So the line with word division or section must have preceding whitespaces. If that is not true, you should append a second + to change the meaning to 0 or more spaces/tabs at beginning of a line.

^([a-z^-]+^) ... matches a string consisting only of letters A-Z, a-z and the hyphen character and this part of the found string is tagged and therefore only this part of the found string is displayed in the function list view.

The next character must be a single space.

Then either the word division or the word section must follow in any case. The space character inside ^}^{ should be remove because there should no space between the 2 arguments of the OR expression.

This expression will therefore not find any of the lines in your example.


2) %[ ^t]++^([0-9a-z-_]++[~end-if .]^).^

% ... again start the search at beginning of a line.

[ ^t]++ ... find 0 or more spaces/tabs at beginning of a line, in other words preceding whitespaces are allowed and should be ignored.

[0-9a-z-_]++ ... should find a string consisting of only letters, numbers and the hyphen character. Well, the hyphen character has normally no special meaning, except in square brackets where it means FROM x TO y. Usually the hyphen is used for something like 0-9 or a-z in a square bracket, but it is also possible to use it for example for character range !-/ which means all characters in ASCII (or ANSI or Unicode) table from the exclamation mark to the slash character. Therefore inside a square bracket the hyphen character should be always escaped with ^ when simple the hyphen character itself is meant. In this regular expression string the hyphen character is surely read as hyphen character because the letter z belongs already to a character range definition. But nevertheless the hyphen character should be escaped here with a preceding ^. Also interesting is that ++ is used instead of just +. ++ means this part of the found string can be also an empty string with no characters. I'm quite sure that this is not correct.

[~end-if .] ... is in combination with the previous expression the reason why the two lines you marked are not found. This expression means that the next character after 0 or more letters, digits or hyphens should NOT be either the character e or E, the character n or N, one of the characters D to I in any case, the character f or F, a space or a point. [~...] does not mean NOT the string in the square bracket, the expression means not any character listed inside the square bracket with character ranges also possible. The result with the previous expression is a definition of overlapping character classes which is never good because the result is weird for beginners although full explainable for experts in regular expressions.

The next character must be a point and the final escape character ^ is completely useless here.


3) % 0000-^main

Well, this regular expression simply finds lines starting with " 0000-main". The escape character ^ is simply useless here because the next character m has no special regular expression meaning and the hyphen left is not inside a square bracket and therefore needs also no escape character to be interpreted as hyphen character.



I don't know anything about syntax of Cobol, but I suggest following:

/Function String = "%[ ^t]++^([a-z^-]+^) +^{division^}^{section^}
/Function String 1 = "%[ ^t]++^([0-9]+-[a-z][a-z^-]++^)."

The second expression means in words:

  1. Find a string at beginnig of a line,
  2. with optionally preceding whitespaces,
  3. with a number with at least 1 digit,
  4. and a hyphen character after the number,
  5. and next character is a letter,
  6. and zero or more additional letters or hyphens,
  7. and a point.
Maybe this expression excludes already strings which should be found and maybe finds strings which should be ignored. If you can tell us in words the syntax rules for function strings for Cobol, and post an example code best enclosed within BBCode [code]...[/code] tags containing strings which should be found and strings which should not be found, and post in a second block what you want to see in the function list view for this example code block, we can find a better regular expression.

A lookbehind to exclude end-if. is not possible with the UltraEdit regular expression engine. But there are possibly other methods to exclude them or we change to the Perl regular expression engine which supports lookbehinds to evaluate a found string already within the search for ignoring it. An extreme example of such a lookbehind usage in Perl regular expression function strings can be viewed at topic How to define a case sensitive function string search?
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Function List - COBOL

Postby rmbunton » Fri Aug 05, 2011 5:01 pm

Mofi, first off, thank you very much for your help.

...

Here is some information on COBOL paragraph names. Got this from googling COBOL paragraph name length:

ILE COBOL Language Reference - COBOL Words

A COBOL paragraph is a COBOL word.

COBOL words must be character-strings from the set of letters, digits, the hyphen, and the underscore. (The hyphen and the underscore cannot appear as the first or last character, however.) In the ILE COBOL language, each lowercase letter is generally equivalent to the corresponding uppercase letter.

COBOL paragraphs are executable procedures that can be invoked or reference in a COBOL "PERFORM" statement.

...

I took your suggestion to change regex engines from UltraEdit to Perl. Here is were I am, from an implementation point of view. I have updated my UltraEdit wordfile for COBOL to support Perl. Below is a code snippet of the changes in the wordfile. BTW, I got the COBOL wordfile from the UltraEdit download webpages.

Code: Select all
/L20"Cobol" COBOL_LANG Line Comment Num = 2*  Nocase File Extensions = CBL COB CPY
/TGBegin "Function"
/TGFindStr = "^[ \t]*([a-z\-]+) +(division|section)"
/TGFindStr = "^[ \t]*([0-9a-z]+-[0-9a-z][0-9a-z\-]*)\.(?<!end-if\.)(?<!end-exec\.)(?<!end-perform\.)(?<!program-id\.)(?<!date-written\.)(?<!date-compiled\.)(?<!source-computer\.)(?<!object-computer\.)(?<!special-names\.)(?<!file-control\.)"
/TGEnd
/Regexp Type = Perl
/Delimiters = ~!@$%^&*()_+=|\/{}[]:;"'<> ,   .?/
/C1
accept access acquire actual add address advancing after all allowing alphabet alphabetic alphabetic-lower alphabetic-upper alphanumeric
...

The first regex line gets most of the standard COBOL divisions and sections. A COBOL program is logically divided into "division" which contain "sections". These are nice to have in an source outline.
The second regex line is my current attempt to get the COBOL "procedure" division paragraphs. These paragraphs represent the COBOL program's executable code. Like the methods in a C++ object. Less the fact that there is no concept of parameter passing. All variables are global.
I have also attempted to eliminate some of the COBOL reserved words like: "end-if", "end-exec", and "end-perform" for example. COBOL reserve words are part of the ANSI approved syntax. So with these two regex statements things seem to work well. Now I am not sure if they are syntactically correct. But with the above descriptions for COBOL words and paragraphs you might be able to tell me if I am close or not. Thanks in advance...

...

The following COBOL code snippet is what I am using for building my test outline:

Code: Select all
 identification division.
     program-id.
     author.
     installation.
     date-written.
     date-compiled.

 environment division.

 configuration section.
     source-computer.
     object-computer.

     special-names.

 input-output section.
    file-control.

 data division.

 file section.

 working-storage section.

 linkage section.


 procedure division

 0000-main.

     if value-eof
        continue
     else
        if value-duplicate
           move zero to sqlcode
        else
           if value-failure
              move 999
           end-if
        end-if
     end-if.

 0010-server-init.

      exec sql
         select aaaa
         from bbbbbbb
      end-exec.

 0020-assign-input-fields.
 0030-assign-output-fields.
 0040-standardize-sp-params.
 0050-validate-logon.
 0055-get-users-hipaa-unit.
 0060-process-request.
 0070-LOG-ACTIONS.
rmbunton
Newbie
 
Posts: 2
Joined: Wed Aug 03, 2011 10:41 am

Re: Function List - COBOL

Postby Mofi » Sat Aug 06, 2011 6:15 am

Very good job! The Perl regular expressions are syntactically absolutely correct. This topic can be now helpful for other COBOL programmers using UltraEdit or UEStudio and wanting a fine working function list.

I have just a few suggestions for further improvement:

  1. The underscore character should be added to the character classes because it is a possible character even when you don't use it.
  2. The point at end of a paragraph can be specified in the second regular expression search string also at end of the expression which avoids the need to add the point to every negative lookbehind expression.
  3. As long as you don't really want to use a grouped function list, it is better to use the old style function string definitions in the wordfile for downwards compatibility. For you and all other users of UltraEdit with v16.00 or later and users of UEStudio with v10.00 or later it does not make a difference when using Flat List option, but for users with former versions of UE or UES which do not support the new function string definitions.
  4. The list of delimiter characters contains the underscore although the underscore should not be a word delimiting character according to definition of COBOL words. The the slash character is present twice in the delimiters list.
So I suggest following:

Code: Select all
/L20"Cobol" COBOL_LANG Line Comment Num = 2*  Nocase File Extensions = CBL COB CPY
/Delimiters = ~!@$%^&*()+=|\/{}[]:;"'<> ,   .?
/Regexp Type = Perl
/Function String = "^[ \t]*([a-z\-_]+) +(division|section)"
/Function String 1 = "^[ \t]*([0-9a-z]+-[0-9a-z][0-9a-z\-_]*)(?<!end-if)(?<!end-exec)(?<!end-perform)(?<!program-id)(?<!date-written)(?<!date-compiled)(?<!source-computer)(?<!object-computer)(?<!special-names)(?<!file-control)\."
/C1
accept access acquire actual add address advancing after all allowing alphabet alphabetic alphabetic-lower alphabetic-upper alphanumeric
...

COBOL programmers copying above or below please note that the multiple spaces between , and . in the list of delimiter characters must be replaced by a tab character.

UltraEdit v13.10 or UEStudio v6.30 is at least needed for above Perl regular expression function strings in the wordfile.

Perhaps it is useful to use grouped function strings for COBOL files by defining 1 group for functions, another one for sections and a third for divisions.

Code: Select all
/L20"Cobol" COBOL_LANG Line Comment Num = 2*  Nocase File Extensions = CBL COB CPY
/Delimiters = ~!@$%^&*()+=|\/{}[]:;"'<> ,   .?
/Regexp Type = Perl
/TGBegin "Sections"
/TGFindStr = "^[ \t]*([a-z\-_]+) +section"
/TGEnd
/TGBegin "Divisions"
/TGFindStr = "^[ \t]*([a-z\-_]+) +division"
/TGEnd
/TGBegin "Functions"
/TGFindStr = "^[ \t]*([0-9a-z]+-[0-9a-z][0-9a-z\-_]*)(?<!end-if)(?<!end-exec)(?<!end-perform)(?<!program-id)(?<!date-written)(?<!date-compiled)(?<!source-computer)(?<!object-computer)(?<!special-names)(?<!file-control)\."
/TGEnd
/C1
accept access acquire actual add address advancing after all allowing alphabet alphabetic alphabetic-lower alphabetic-upper alphanumeric
...

Perhaps you are interested in further enhancing the user contributed wordfile for COBOL, for example with adding indent/unindent and open/close fold strings to use auto-indent and code folding feature for COBOL files. Giving the color groups names would be also fine. (I don't know what C3 to C5 are for.) end-exec is missing in word list of color group 1 as I could see on your example code. And usually it is good to highlight the delimiter characters too. I have made all these enhancements (and sorted the delimiter characters) to cobol.uew from extras download page and attached the improved wordfile here.

The ILE Cobol documentation looks good for further improving the syntax highlighting wordfile by adding additional words (and perhaps remove not documented words), at least for ILE Cobol. When you are interested in further enhancements of the COBOL wordfile and you finally got it, please send the improved wordfile by email to IDM support with the request to replace the existing cobol.uew on their server. The wordfile sent to IDM should not contain any color and font style settings.
Attachments
cobol.zip
A slightly improved version of cobol.uew.
(2.93 KiB) Downloaded 149 times
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to Syntax Highlighting