display files with variable hex record length

Display customization and font issues

display files with variable hex record length

Postby cotthemh » Tue May 29, 2007 9:40 am

We use UltraEdit to edit large HEX files (>4 GB). The files contain a mixture of (fixed and variable) ASCII text and hex fields (generated by microfocus cobol). Some of the hex fields contain 0A as a value and are thus interpreted by UE as a newline. As a consequence the record count and number of records in the file is incorrect.

The actual structure of the variable (hex and ASCII) fields is all the same : 2 bytes for the length followed by the actual data. To determine whether it is a fixed or variable record length the only option we see is what is contained in the cobol definition which specifies the content of the data.

Is there a way to formally specify these columns, so that we can mix fixed/variable ascii/hex fields with each other, eventually giving a specification of the columns in some way?

How would you deal with these files? (They reside on a unix server and are edited through a san connected windows server.)

Many thanks in advance for you're assistance,

Herwig
User avatar
cotthemh
Newbie
 
Posts: 5
Joined: Mon May 28, 2007 11:00 pm

Re: display files with variable hex record length

Postby Mofi » Tue May 29, 2007 10:40 am

With Only recognize DOS terminated lines (CR/LF) as new lines for editing enabled at Configuration - File Handling - DOS/UNIX/MAC Handling a single 0A without a preceding 0D would not be interpreted anymore as new line. Maybe this helps.

If the file contains also NULL bytes, you should also enable Allow editing of text files with HEX 00's without converting them to spaces at Configuration - Editor - Advanced.

But if the file contains also binary values lower hexadecimal 20 (space) except 09 (tab), 0A (line-feed), 0C (form-feed) and 0D (carriage return) it would be better to open the file in hex edit mode instead of text mode.

I have explained at UE 12.20 Want a simple hex editor, no smarts how to open a file in hex edit mode when UltraEdit can't detect it by itself as binary file.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: display files with variable hex record length

Postby cotthemh » Tue May 29, 2007 1:39 pm

thanks for you're fast reply,

the issue is a bit more tricky. as each line represents a record we would like to see on line xxx the equivallent record xxx. For about 95 % of the file this editor works wonderfully, only in some cases the hex parts of the record contain the same values as the line separator.

in all cases the line separator is 0A, so we need the CR/LF option set
some hex fields contain 0A, but this is always preceeded by 00, so if the settings are inappropriate for the first option

so our first idea is to simply massively replace the 0A by 0D 0A (if not preceeded by 00), and enable the option to interpret the cr/LF as you specify,

Unfortunately the filler of a field is also 00 so in some cases a real end of line is also preceeded by 00

Is there other ways to direct the editor to ignore specific fields or do you know of some converter which converts this file to something editable ?

Many thanks,

Herwig
User avatar
cotthemh
Newbie
 
Posts: 5
Joined: Mon May 28, 2007 11:00 pm

Re: display files with variable hex record length

Postby Mofi » Tue May 29, 2007 2:11 pm

Is your file a fixed column file which means is the number of bytes per line equal for all lines?

That would make it possible to insert the CR with File - Special Functions - Insert String At Every Increment.

If the line length is not fixed, converting the real line endings to DOS format by inserting a CR before every LF which is not part of a binary byte sequence could be the solution. A regular expression replace in files could do that before opening the file. For example if the binary byte sequence can be identified by a special ASCII string and the number of binary bytes is constant, then it would be no problem to only replace the real 0A to 0D 0A.

Some example lines which cover all the normal and abnormal LFs in your file would help us to help you.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: display files with variable hex record length

Postby cotthemh » Wed May 30, 2007 8:49 am

here is a sample file attached, the issue is that the 0A is proceeded with 00 both in some cases where it is an end of line and within some records, so we don't have a real regular expression to determine the actual EOL ...

i tried to attach the file as text but i get that this extension is not allowed ...
User avatar
cotthemh
Newbie
 
Posts: 5
Joined: Mon May 28, 2007 11:00 pm

Re: display files with variable hex record length

Postby Mofi » Wed May 30, 2007 10:04 am

Attaching files is not possible to the IDM user forums, even if you see the form for doing it. You have to upload the file in a zip-archive anywhere (there are lots of free hosting services) and post a link to the zip-file.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: display files with variable hex record length

Postby cotthemh » Wed May 30, 2007 10:35 am

our company firewall does not allow the upload of anything,

the current file structure we have is
2bytes record length - some ascii fields mixed with hex fields - 0A

in some of the cases we get :

2bytes record length - some ascii fields mixed with hex fields 00 0A

the 00 comes from the filler of the last hex field

and some of the hex fields contain 00 0A (which is thus misinterpreted as a linefeed)

the only way to properly split is to use the record length in the beginning,

so any suggestion is more than welcome,

Herwig
User avatar
cotthemh
Newbie
 
Posts: 5
Joined: Mon May 28, 2007 11:00 pm

Re: display files with variable hex record length

Postby jorrasdk » Wed May 30, 2007 12:31 pm

Ok here is another suggestion to give us example data:

- open new blank file in UE
- Use File - Special functions - Insert file and insert your file
- Then activate Hex edit mode (ctrl+H)
- Select all (ctrl+a)
- Then Edit - Hex functions - Hex copy selected view
- Then paste into a new blank file (delete some from the end if it is more than 100 lines)
- Then post the copied hex into a post in this thread and surround with code and /code tags.

It should look something like this:
Code: Select all
00000000h: 00 18 61 73 63 69 69 64 61 74 61 00 00 00 00 00 ; ..asciidata.....
00000010h: 00 00 00 00 00 0A 00 21 6D 6F 72 65 20 61 73 63 ; .......!more asc
00000020h: 69 69 20 64 61 74 61 0A 2E 2E 2E 2E 2E          ; ii data......
User avatar
jorrasdk
Master
Master
 
Posts: 275
Joined: Mon Mar 19, 2007 11:00 pm
Location: Denmark

Re: display files with variable hex record length

Postby jorrasdk » Wed May 30, 2007 2:30 pm

I think I see a pattern "0A 00 00 34 00" which is the record delimiter (0A) and some kind of fixed record header "00 00 34 00".

So if you followed Mofis recommendations:
Mofi wrote:With Only recognize DOS terminated lines (CR/LF) as new lines for editing enabled at Configuration - File Handling - DOS/UNIX/MAC Handling a single 0A without a preceding 0D would not be interpreted anymore as new line. Maybe this helps.

If the file contains also NULL bytes, you should also enable Allow editing of text files with HEX 00's without converting them to spaces at Configuration - Editor - Advanced.


Then enter hex edit mode (ctrl+h). Do a find/replace :

Find: 0A00003400
Replace: 0D0A00003400
(make sure the "find ASCII" is unchecked).

Then I imagine the embedded 0A's is untouched and you now have standard Windows line terminators: CR LF.
User avatar
jorrasdk
Master
Master
 
Posts: 275
Joined: Mon Mar 19, 2007 11:00 pm
Location: Denmark

Re: display files with variable hex record length

Postby cotthemh » Fri Jun 01, 2007 6:47 am

i would need to check that on a number of other files, but at least in theory we have no guarantee that this pattern does not occur in hex zones.

secondly this forces a replace on large files, so its quite likely its more effective to just convert the file to an editable format (the first 2 bytes of each line contain the length of the line, so a converter would just need to replace the right characters on the right place)

we will keep you posted,
H
User avatar
cotthemh
Newbie
 
Posts: 5
Joined: Mon May 28, 2007 11:00 pm

Re: display files with variable hex record length

Postby Mofi » Fri Jun 01, 2007 7:29 am

A JavaScript script could evaluate the number of bytes per line count and insert the 0D at every line on the correct position. But such a script would be horrible slow on your very large files. A macro can't do this job.

Maybe you better write a small C or C++ console application which does the job. The program could copy the file with inserting a 0D according to the byte counter of each line. That is a very simple program which only needs a few code lines.

BTW: I can't see where are the 2 bytes for the record length in your files. The number of bytes between each 0A is varying, but where are the 2 bytes indicating this length. Can you post the byte offsets where the record lengths are stored in your example?

Can you modify the microfocus cobol program to write \r\n instead of only \n when creating the files? However, 0D 0A could also exist in the hex part too.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to Editor Display

cron