Search specific pattern in columns with specific header

Help with writing and playing macros

Search specific pattern in columns with specific header

Postby HansFink » Wed May 30, 2007 8:39 am

Hi,

I have many text files which contain columns with headers and data. The position of the header names can be different from file to file so I cannot use fixed column numbers.

I need a macro that searches for a column header (HEADER1) and then searches for a pattern like ?.??? (all values with 3 decimals).
All other files which do not contain this pattern should be closed.

Is that possible?
User avatar
HansFink
Basic User
Basic User
 
Posts: 17
Joined: Fri Jan 27, 2006 12:00 am

Re: Search specific pattern in columns with specific header

Postby Mofi » Thu May 31, 2007 6:25 am

Yes, that is possible. I only need 1 or 2 examples of your source files (header line + some content lines), best enclosed in BBCode tags [code][/code].

Are your files CSV or fixed column files?

That the header is not always on the same position is not a problem. The macro can search in first line of the file for "HEADER1" and when found converts a copy of the first line of the file to a regular expression string depending on the format of the file: CSV or fixed column. This regular expression string is then used to find with additional [0-9].[0-9][0-9][0-9] the number in the correct column. If not found, the file is closed and the macro continues on the next file until all files are evaluated.

Your version of UltraEdit is also important for that macro.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Search specific pattern in columns with specific header

Postby HansFink » Thu May 31, 2007 9:52 am

OK, here is a typically text file:

Code: Select all
NAME            A11B4
DESCRIPTION     ABC
TABELLE
Description     ABC     CBA     HEADER1         s-out
A1174-3123      3       1.5     0.004455        3
A1174-4123      4       10      1.0866          4
A1174-5123      5       1.5     1.013459        5
COUNT
1174-3123       1       0


Line 1 to 3 is the general header of the file

Line Description to A1174-5 ist the main area where the data is
The line description contains the header names like HEADER1,
below is the data.

Line COUNT to the end is the footer.

The columns are separated by tabs. All columns with HEADER1 should be searched for values with 3 decimals (?.???). All open files not containing at least one line should be closed.
User avatar
HansFink
Basic User
Basic User
 
Posts: 17
Joined: Fri Jan 27, 2006 12:00 am

Re: Search specific pattern in columns with specific header

Postby Mofi » Thu May 31, 2007 1:40 pm

Okay, the following macro hopefully does the job. I hope, there is always only 1 tab between the columns and so that part of the file is like a CSV file with the tab as delimiter.

Make sure all open files are saved before running the macro. The macro must temporarily modify every file, but does not really change the contents. All files which remain open are indicated as modified although the contents of the still open files are not changed by the macro (except for missing line termination at end of the file).

The macro property Continue if a Find with Replace not found must be checked for this macro.

InsertMode
ColumnModeOff
HexOff
UnixReOff
Clipboard 9
Top
"ThIs Is ThE FiRsT FiLe!"
NextWindow
Loop
Bottom
IfColNum 1
Else
"
"
EndIf
Top
Find MatchCase "HEADER1"
IfFound
Key Ctrl+LEFT ARROW
StartSelect
Key HOME
Copy
EndSelect
Top
Paste
"
"
Key UP ARROW
SelectLine
Find RegExp "[~^t^p]+^t"
Replace All SelectText "*^^^^t"
EndSelect
Top
"%"
Key END
"[0-9].[0-9][0-9][0-9][^t^r^n]"
StartSelect
Key HOME
Cut
EndSelect
DeleteLine
Find RegExp "^c"
IfNotFound
Top
EndIf
Else
Top
EndIf
IfSel
Top
Find MatchCase "ThIs Is ThE FiRsT FiLe!"
Replace ""
IfFound
Find RegExp "^c"
ExitLoop
EndIf
Find RegExp "^c"
NextWindow
Else
Find MatchCase "ThIs Is ThE FiRsT FiLe!"
Replace ""
IfFound
CloseFile NoSave
ExitLoop
Else
CloseFile NoSave
EndIf
EndIf
EndLoop
ClearClipboard
Clipboard 0

Here is the macro again in UEM format with comments - see Macro examples and reference for beginners and experts how to setup UltraEdit to best view a macro code in this format. I have used 4 spaces instead of every tab (used command Tabs To Spaces) to get a correct HTML output here.

Code: Select all
InsertMode
ColumnModeOff
HexOff
UnixReOff
Clipboard 9
//  Mark the first file with a special string to know when to exit the loop.
Top
"ThIs Is ThE FiRsT FiLe!"
/*! The first file must be evaluated as last file because it propably does not
    not contain the string of interest. The macro then could not close it to
    avoid an endless loop, although it should be close. So better evaluate
    the first file as last file. !*/
NextWindow
Loop
/*! Insert a line termination at end of the file if last line is not already terminated.
    This is necessary when the column HEADER1 is the last column and so after ?.??? the
    line termination follows. !*/
    Bottom
    IfColNum 1
    Else
        "
        "
    EndIf
    Top
/*! Back at top of the file search for the header. If not found, ignore this file and
    later close it, because it surely does not contain ?.??? in the requested column. !*/
    Find MatchCase "HEADER1"
    IfFound
/*! Header found! Copy everything from start of the current line
    to beginning of HEADER1 into a new line at top of the file. !*/
        Key Ctrl+LEFT ARROW
        StartSelect
        Key HOME
        Copy
        EndSelect
        Top
        Paste
        "
        "
        Key UP ARROW
        SelectLine
/*! Convert now this part of the header line into an UltraEdit style regular expression
    with the required part to find ?.??? at end of the column or line, if the HEADER1
    column is the last column. A header line like

    Description     ABC     CBA     HEADER1     ...

    will be converted into

    %*^t*^t*^t[0-9].[0-9][0-9][0-9][^t^r^n]

!*/
        Find RegExp "[~^t^p]+^t"
        Replace All SelectText "*^^^^t"
        EndSelect
        Top
        "%"
        Key END
        "[0-9].[0-9][0-9][0-9][^t^r^n]"
//  Copy this line into the user clipboard 9 and delete the line.
        StartSelect
        Key HOME
        Cut
        EndSelect
        DeleteLine
//  Search for the regular expression in the clipboard. This works only with UE style.
        Find RegExp "^c"
//  This useless looking code is necessary for the second Find/Replace in the Else branch.
        IfNotFound
            Top
        EndIf
//  This useless looking code is necessary for the second Find/Replace in the Else branch.
    Else
            Top
    EndIf
    IfSel
/*! The regular expression has found ?.??? in the correct column. So don't close
    this file, but exit the loop when this file is the first/last file to evaluate.
    But before always position the cursor to the string of interest. !*/
        Top
        Find MatchCase "ThIs Is ThE FiRsT FiLe!"
        Replace ""
        IfFound
         Find RegExp "^c"
            ExitLoop
        EndIf
      Find RegExp "^c"
      NextWindow
    Else
/*! No HEADER1 or no ?.??? in the column of HEADER1 - close the file. But first check
    if this file is the first/last file to evaluate and exit the loop if this is true. !*/
        Find MatchCase "ThIs Is ThE FiRsT FiLe!"
        Replace ""
        IfFound
            CloseFile NoSave
            ExitLoop
        Else
            CloseFile NoSave
        EndIf
    EndIf
EndLoop
ClearClipboard
Clipboard 0
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Search specific pattern in columns with specific header

Postby HansFink » Mon Jun 04, 2007 5:26 am

Thanks Mofi,
I tried your macro but it doesn't work for me. It always closes all files, also those files which contain the search string (?.???).

To make things hopefully easier, the following conditions are given:

- All open files already contain the required header (HEADER1), so after the data of HEADER1 is a tab (?.???^t)

- HEADER1 is never the first or last column

- The columns are not of fixed size

- The column number of HEADER1 can differ from file to file

- Column separator is one TAB between data, but can be several tabs between headers (headers can be missing, but not HEADER1)
User avatar
HansFink
Basic User
Basic User
 
Posts: 17
Joined: Fri Jan 27, 2006 12:00 am

Re: Search specific pattern in columns with specific header

Postby jorrasdk » Mon Jun 04, 2007 6:33 am

Strange. I see no problems with Mofis macro. I reproduced three files from the specs above: 2 that is supposed to be closed and one that stays open because it contain the pattern ?.???. And the macro did exactly what it was supposed to do.

So maybe the next step is for you to zip 2 files: One with the ?.??? pattern and one without. Upload to a server or service of your choice and post a link to the zip file. (Zip file cannot be uploaded to this forum).
User avatar
jorrasdk
Master
Master
 
Posts: 275
Joined: Mon Mar 19, 2007 11:00 pm
Location: Denmark

Re: Search specific pattern in columns with specific header

Postby Mofi » Mon Jun 04, 2007 6:57 am

HansFink wrote:- All open files already contain the required header (HEADER1), so after the data of HEADER1 is a tab (?.???^t)


That is already handled by the macro. 4 lines could be removed from the macro, but for security I would not do that.

HansFink wrote:- HEADER1 is never the first or last column


Then you can remove ^r^n and the code part

Bottom
IfColNum 1
Else
"
"
EndIf


HansFink wrote:- The columns are not of fixed size

- The column number of HEADER1 can differ from file to file


That is what the macro is designed for. It uses the tabs to identify the correct column.

HansFink wrote:- Column separator is one TAB between data, but can be several tabs between headers (headers can be missing, but not HEADER1)


I think, this is the problem because I thought there is always at least 1 character between the tabs in the header line and no column with an empty column header.

Insert following below the command SelectLine to handle also empty column headers (not tested):

Find "^t"
Replace All SelectText "#^t"
Top
SelectLine


As you can see there is now inserted always 1 character before every tab and then the line is reselected for the following regular expression replace to convert the header line into a regular expression string with now correct *^t for every column even for those with no column header.

If the macro is still not working, please upload some example files in a zip-archive and post a link to it as suggested by jorrasdk.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Search specific pattern in columns with specific header

Postby HansFink » Mon Jun 04, 2007 7:45 am

Thanks again, but it still closes all files. Maybe my version of UltraEdit is not compatible (11.20b) or my example file is not good enough.

Here are two better file examples.

S_1.txt contains search string and should remain open
There is no header for column 4 in line 6
Code: Select all
NAME          S_1.5 2-3x
SHORTNAME     S_1
TRANS         0
TABELLE
tName         A             ABC           D_IC
Description   A             Abc                         HEADER1       s-out
S_100 1010    3             26            0             0.018         4
S_125 1010    3.25          28            0             0.03          4
S_130 1010    3.3           32            0             0.0186        4
CONT
S_100 1010    0
S_125 1010    1
S_130 1010    1


R_1.txt does not contain search string and should be closed
Code: Select all
NAME          T_1.5 2-3x
SHORTNAME     T_1
TRANS         0
TABELLE
typeName      A             ABC           XYZ
Description   A             Abc           Xyz           HEADER1       s-out
T_1-4         4             5.7           17            0.2           4
T_1-5         5             5             18.5          0.2           5
T_1-1/4IN     6.35          6.35          21.175        0.2           6.35
CONT
T_1-4         1
T_1-5         1
T_1-1/4IN     1
User avatar
HansFink
Basic User
Basic User
 
Posts: 17
Joined: Fri Jan 27, 2006 12:00 am

Re: Search specific pattern in columns with specific header

Postby Mofi » Mon Jun 04, 2007 8:13 am

I have tested the macro with the suggested modifications as you can see now below and it worked perfect. S_1.txt remains open and the line with 0.018 is marked - from start of the line to tab after 0.018. I have tested it with UltraEdit v11.20b too.

InsertMode
ColumnModeOff
HexOff
UnixReOff
Clipboard 9
Top
"ThIs Is ThE FiRsT FiLe!"
NextWindow
Loop
Top
Find MatchCase "HEADER1"
IfFound
Key Ctrl+LEFT ARROW
StartSelect
Key HOME
Copy
EndSelect
Top
Paste
"
"
Key UP ARROW
SelectLine
Find "^t"
Replace All SelectText "#^t"
Top
SelectLine

Find RegExp "[~^t^p]+^t"
Replace All SelectText "*^^^^t"
EndSelect
Top
"%"
Key END
"[0-9].[0-9][0-9][0-9]^t"
StartSelect
Key HOME
Cut
EndSelect
DeleteLine
Find RegExp "^c"
IfNotFound
Top
EndIf
Else
Top
EndIf
IfSel
Top
Find MatchCase "ThIs Is ThE FiRsT FiLe!"
Replace ""
IfFound
Find RegExp "^c"
ExitLoop
EndIf
Find RegExp "^c"
NextWindow
Else
Find MatchCase "ThIs Is ThE FiRsT FiLe!"
Replace ""
IfFound
CloseFile NoSave
ExitLoop
Else
CloseFile NoSave
EndIf
EndIf
EndLoop
ClearClipboard
Clipboard 0

The 2 files looks as follows after converting the spaces into tabs with a very simple regular expression. » is a tab, is a DOS line termination and · is a normal space.

NAME»S_1.5·2-3x
SHORTNAME»S_1
TRANS»0
TABELLE
tName»A»ABC»D_IC
Description»A»Abc»»HEADER1»s-out
S_100·1010»3»26»0»0.018»4
S_125·1010»3.25»28»0»0.03»4
S_130·1010»3.3»32»0»0.0186»4
CONT
S_100·1010»0
S_125·1010»1
S_130·1010»1

NAME»T_1.5·2-3x
SHORTNAME»T_1
TRANS»0
TABELLE
typeName»A»ABC»XYZ
Description»A»Abc»Xyz»HEADER1»s-out
T_1-4»4»5.7»17»0.2»4
T_1-5»5»5»18.5»0.2»5
T_1-1/4IN»6.35»6.35»21.175»0.2»6.35
CONT
T_1-4»1
T_1-5»1
T_1-1/4IN»1
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Search specific pattern in columns with specific header

Postby HansFink » Mon Jun 04, 2007 10:05 am

Now it works, thanks. That saves a lot of time.
User avatar
HansFink
Basic User
Basic User
 
Posts: 17
Joined: Fri Jan 27, 2006 12:00 am


Return to Macros