Counting TABs in a line

Help with writing and playing macros

Counting TABs in a line

Postby treedude2525 » Tue Jan 30, 2007 8:45 pm

I'd like to create a macro to count tabs in each line of a text file. If the line has 4 tabs (desired), the macro goes to the next line w/o marking. If the line has more or fewer than 4 tabs, the macro marks the line (say "***" at the end of the line) and moves on.

I've only written a couple of UE macros -- I have some VBA macro experience w/ M$ eXcel and word, but I used to write a lot of macros for WP4.0, 5.1, and 6.x > for Win. Dang, I miss PerfectScript.

Peace,
- Sequoia
User avatar
treedude2525
Newbie
 
Posts: 4
Joined: Tue Jan 30, 2007 12:00 am

Re: Counting TABs in a line

Postby Mofi » Wed Jan 31, 2007 8:41 am

The following macro should to the job. It is designed to run on a DOS file (^p = CRLF). It first checks if the last line of the file as a line termination and if not, it appends one.

Next it converts the character » at start of every line to a special string because the » is later used to mark the lines which have exactly 4 tabs.

The next 2 regular expressions inserts at start of every line which has exactly 4 tabs the marker character ».

The fourth regular expression marks now all lines without the marker character » at start of the line with "***" at the end of the line as requested.

Next the marker character » at start of every line is removed and last the special escape string is converted back to character ». You can remove first and last regular expression find and replace all if your file never contains the character » at start of a line.

The macro property Continue if a Find with Replace not found must be checked for this macro.

InsertMode
ColumnModeOff
HexOff
UnixReOff
Bottom
IfColNum 1
Else
"
"
EndIf
Top
Find RegExp "%»"
Replace All "MaRkErChAr"
Find RegExp "%^([~^t^p]++^t[~^t^p]++^t[~^t^p]++^t[~^t^p]++^t^)$"
Replace All "»^1"
Find RegExp "%^([~^t^p]++^t[~^t^p]++^t[~^t^p]++^t[~^t^p]++^t[~^t^p]+^)$"
Replace All "»^1"
Find RegExp "%^([~»]*^)$"
Replace All "^1***"
Find RegExp "%»"
Replace All ""
Find MatchCase RegExp "%MaRkErChAr"
Replace All "»"

Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Counting TABs in a line

Postby treedude2525 » Wed Jan 31, 2007 5:07 pm

Thanks for the code, Mofi, but I should I have explained better. I want to evaluate each line of text in the file to make sure that it has a TOTAL of 4 tabs, which may or may not be sequential (i.e., the tabs may be in different positions within the line and may be separated by other text).

The macro you wrote seems to be designed to find 4 tabs in sequence. Also, it marks some lines that do have the 4 tabs in sequence, doesn't mark others, and leaves an unknown character (looks like a box) at the end of some lines. I've attached a sample file before and after running the macro. I apologize for all the X's, but I had to replace potentially company-confidental text with nonsense quickly. (note: I tried to attach a file, but it did not work. E.mail me through the forum if you'd like me to send the sample file directly)

Does the UE macro facility have variables? I'd like to be able to do something like (I know some of the commands below are not UE commands, and furthermore I'm mixing syntax from varous programming launguages -- it is just for illustration purposes):

Set Variable TabCount = 0

Code: Select all
While Not at end of file

  Sub: CountTabs
  While NextCharacter is not ^p
     If NextCharacter is ^t
       Set Variable TabCount = (TabCount + 1)
     End If
  Key RIGHT ARROW
  EndWhile
  If TabCount <> 4
       ***
  EndIf
  Key DOWN ARROW
  Set Variable TabCount = 0
  EndSub

 GoSub CountTabs

EndWhile


I know that's sloppy, but I hope it gets the point across.

Thank you so much for your help.

Peace,
- Ssequoia
User avatar
treedude2525
Newbie
 
Posts: 4
Joined: Tue Jan 30, 2007 12:00 am

Re: Counting TABs in a line

Postby Mofi » Thu Feb 01, 2007 12:36 pm

treedude2525 wrote:I want to evaluate each line of text in the file to make sure that it has a TOTAL of 4 tabs, which may or may not be sequential (i.e., the tabs may be in different positions within the line and may be separated by other text).

The macro you wrote seems to be designed to find 4 tabs in sequence.


No! I have written the macro exactly for what you want. I have created quickly a test file to test it which has contained lines with less than 4 tabs, lines with more than 4 tabs, lines with exactly 4 tabs, some in sequence and some with text between. And the macro worked perfect with UE v11.20b and UE v12.20b+1.

treedude2525 wrote:Also, it marks some lines that do have the 4 tabs in sequence, doesn't mark others, and leaves an unknown character (looks like a box) at the end of some lines.


It looks like your version of UltraEdit has a problem with the regex replaces or your file format is different. I tested it with a ASCII DOS file.

treedude2525 wrote:Does the UE macro facility have variables?


No, because the UltraEdit macro interface is not a real script language. That's the reason why I have developed the macro with regex searches. Your code example could be realized also with an UE macro, but it would be extremly complicated and extremly slow.

Post some lines of your file enclosed in [code][/code] and use for example # as place holder for every tab in your example. I can convert the # character back to a real tab with a simple search and replace all. And tell me which format the file as - see second box in the status bar of the UltraEdit window.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Counting TABs in a line

Postby treedude2525 » Thu Feb 01, 2007 4:12 pm

Hi Mofi,

First, I really appreciate your help. Thanks for the forum tip on code/code. Now I know how not to have the spaces "eaten" in my code examples.

I'm working w/ UE ver. 10.20c -- I know it is a few generations behind the current ver., but my company isn't keen on spending money on software upgrades for "odd-ball" programs (I think I'm the only person in the office who uses UE at all, and one of the few outside of IT who even know what a text editor is). The file format is DOS. When I repoened my example file, it converted it back to DOS and stripped the "box" characters.

Here is the example w/ hash marks substituted for tabs:

Code: Select all
// before running macro

XX XXXXXXX (XXX / #XX) XXXXXXXXX XX XXXXXXXX XX XX XX/XX/XXXX####
XXXXXXXXX#XXXXXXXX#XXXXXXX#XXXXXX#XXXXXX XXXXXXXXXXX
XXX:####XXXXXXXXXX
XXXX XXXX:####XXXXXXXXXX
XXX-XXXXX:####XXXXXXXXXX
XXXX XXXXX:####XXXXXXXXXX
XXXX XXXXX:####XXXXXXXXXX
XXXX XXXXXX XXXXXXXX:####XXXXXXXXXX
XXX XXXXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXX XXXXXXXXX XXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXX XXXXXXXXXXX XXXXX:###XX/XX/XX#XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXX XXXXXXXX:###XX/XX/XX#XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX XXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX XXXX XXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX X XXXXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXX XXXXXXXXXXX XXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX XXXXXX XXXXXXXX/XXXXXXXX:####XXXXXXXXXX
XXXXXXXXXXXX XXXXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXXXX XXXX ##XXXXXXXX:####XXXXXXXXX XXXXX - XXXXX XXXXX
XXX#XXXXXXXXX XXXXXXXXX XXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXX XXXXXXXXXX XXXXX:####XXXXXXXXXX
XXXXX XXXXXXX XXXXXXXX:####XXXXXXXXXX
XXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX XXXXXXXX - XX XXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXX/XXXXX XXXXXXXX:####XXXXXXXXXX
XXXXXX XXXXXXXX:####XXXXXXXXXX
XXXXX/XXXXXX XXXXX:###XX/XX/XX#XXXXXXXXXX
XXXX / XXXXX XXXXXXXXXXXX XXXXXXXX:####XXXXXXXXXX
XXXXXXXX XXXXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXXX XXXXXXXXXXXX XXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
XXXXXXXXX XXXXXXX:####XXXXXXXXX XXXXX - XXXXX XXXXX
XXXXXXXXX XXXXX:##"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
XXXXXXXXX XXXXXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
XXXXXXXX XXXXXXXXXXXX XXXXXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
XXXXXXXXXXXXX XXXXX:####XXXXXXXX XXXX XXXXXXXXX - XXXX XXXXXX
XXXXXXXXXXXXX XXXX##XXXX:##XXXXXXXX XXXX XXXXXXXXX - XXXX XXXXXX
XXXXX XXXX / XXXXXXX XXXXXXXX:#XX/XX/XX###XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX XXXXX XXX:#####XXXXXXXXXX
XXXXXXX XXXXX XXX:####XXXXXXXXXX
XXXXXXX XXXXX XXX:##XXXXXXXXXX

// after running macro

XX XXXXXXX (XXX / #XX) XXXXXXXXX XX XXXXXXXX XX XX XX/XX/XXXX####
XXXXXXXXX#XXXXXXXX#XXXXXXX#XXXXXX#XXXXXX XXXXXXXXXXX
XXX:####XXXXXXXXXX***
XXXX XXXX:####XXXXXXXXXX
XXX-XXXXX:####XXXXXXXXXX***
XXXX XXXXX:####XXXXXXXXXX
XXXX XXXXX:####XXXXXXXXXX***
XXXX XXXXXX XXXXXXXX:####XXXXXXXXXX
XXX XXXXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
XXX XXXXXXXXX XXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXX XXXXXXXXXXX XXXXX:###XX/XX/XX#XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
XXXXX XXXXXXXX:###XX/XX/XX#XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX XXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
XXXXXXX XXXX XXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX X XXXXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
XXXXXX XXXXXXXXXXX XXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX XXXXXX XXXXXXXX/XXXXXXXX:####XXXXXXXXXX***
XXXXXXXXXXXX XXXXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXXXX XXXX ##XXXXXXXX:####XXXXXXXXX XXXXX - XXXXX XXXXX***
XXX#XXXXXXXXX XXXXXXXXX XXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXX XXXXXXXXXX XXXXX:####XXXXXXXXXX
XXXXX XXXXXXX XXXXXXXX:####XXXXXXXXXX***
XXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX XXXXXXXX - XX XXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
XXX/XXXXX XXXXXXXX:####XXXXXXXXXX
XXXXXX XXXXXXXX:####XXXXXXXXXX***
XXXXX/XXXXXX XXXXX:###XX/XX/XX#XXXXXXXXXX
XXXX / XXXXX XXXXXXXXXXXX XXXXXXXX:####XXXXXXXXXX***
XXXXXXXX XXXXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXXX XXXXXXXXXXXX XXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"***
XXXXXXXXX XXXXXXX:####XXXXXXXXX XXXXX - XXXXX XXXXX
XXXXXXXXX XXXXX:##"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"***
XXXXXXXXX XXXXXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
XXXXXXXX XXXXXXXXXXXX XXXXXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"***
XXXXXXXXXXXXX XXXXX:####XXXXXXXX XXXX XXXXXXXXX - XXXX XXXXXX
XXXXXXXXXXXXX XXXX##XXXX:##XXXXXXXX XXXX XXXXXXXXX - XXXX XXXXXX***
XXXXX XXXX / XXXXXXX XXXXXXXX:#XX/XX/XX###XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
XXXXXXX XXXXX XXX:#####XXXXXXXXXX***
XXXXXXX XXXXX XXX:####XXXXXXXXXX
XXXXXXX XXXXX XXX:##XXXXXXXXXX***


Thanks again for your help.

- Sequoia
User avatar
treedude2525
Newbie
 
Posts: 4
Joined: Tue Jan 30, 2007 12:00 am

Re: Counting TABs in a line

Postby Mofi » Fri Feb 02, 2007 8:39 am

Okay, I solved the problem. I already know that UE versions prior v11.10c (I think, I'm not sure) has problems with $ in some replaces. But I don't use such an outdated version anymore. I have just archived it for situations like this one. My first posted macro worked perfect on your example with UE v11.20b.

Here is the macro which should work also for your version of UE. I have tested it with UE v10.20d. It avoids the problem with $ by inserting the character « at end of every line temporarily and uses this character as end of line indicator.

The macro property Continue if a Find with Replace not found must be checked for this macro.

InsertMode
ColumnModeOff
HexOff
UnixReOff
Bottom
IfColNum 1
Else
"
"
EndIf
Top
Find RegExp "%»"
Replace All "MaRkErChAr1"
Find "«"
Replace All "MaRkErChAr2"
Find "^p"
Replace All "«^p"
Find RegExp "%^([~^t«]++^t[~^t«]++^t[~^t«]++^t[~^t«]++^t«^)"
Replace All "»^1"
Find RegExp "%^([~^t«]++^t[~^t«]++^t[~^t«]++^t[~^t«]++^t[~^t«]+«^)"
Replace All "»^1"
Find RegExp "%^([~»]*^)«"
Replace All "^1***«"
Find RegExp "%»"
Replace All ""
Find RegExp "«$"
Replace All ""
Find MatchCase RegExp "%MaRkErChAr1"
Replace All "»"
Find MatchCase "MaRkErChAr2"
Replace All "«"

Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Counting TABs in a line

Postby treedude2525 » Fri Feb 02, 2007 11:15 pm

Thanks Mofi,

Works perfrectly. Now I just need to get time to study it so I can figger out how to do similar stuff myself. I'm not really familiar with regular expresssions -- looks like I've have to check 'em out a bit more.

Thanks again and have a great weekend!

- Sequoia
User avatar
treedude2525
Newbie
 
Posts: 4
Joined: Tue Jan 30, 2007 12:00 am

Lines with more a less of a special character

Postby rasch1971 » Fri Jul 06, 2007 8:12 pm

Hello ...

I would like to find all lines of an text file which has more or less than 10 "|" characters.

I need this method to check csv-text files (seperated by "|") before I import them into my database. I use UltraEdit v12.10.

Kind regards
Ralf
User avatar
rasch1971
Newbie
 
Posts: 1
Joined: Thu Jul 05, 2007 11:00 pm

Re: Lines with more a less of a special character

Postby pietzcker » Sat Jul 07, 2007 10:38 am

Hi Ralf,

does this work? It's not beautiful or optimized, mostly because I don't know whether empty fields may occur. It does assume that the delimiter | doesn't occur within strings or escaped... So
Code: Select all
^(?:[^|\r\n]*\|){1,9}[^|]*$|^(?:[^|\r\n]*\|){11,}[^|]*$


will match each line except lines 2 and 5 of the following:

Code: Select all
asd|sdf|sdf|asd|sdf|sdf|asd|sdf|sdf|asd|sdf|sdf|fg
|sdf|sdf|asd|sdf|sdf|asd|sdf|sdf|asd|xcc
|sdf|sdf|asd|sdf|sdf
asd|sdf|sdf|asd|sdf|sdf|asd|sdf||asd|sdf|sdf|asd|sdf|sdf|
||||||||||
||||||
|||||||||||


You'll run into trouble if lines may look like

Code: Select all
asd|sdf|sdf|"as|d"|sdf|sdf|asd|sdf|sdf|asd|xcc


where the | in "as|d" is not supposed to count as a delimiter.

You need to turn Perl regular expressions on and check the "regular expressions" checkbox in the replace dialog. I tested this with V13.10+3, I hope it also works in V12.

HTH,
Tim
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Lines with more a less of a special character

Postby Mofi » Sat Jul 07, 2007 11:34 am

Pietzcker demonstrates here once again the power of the Perl engine.

The solution I used was to mark all lines which has the correct number of delimiters. Then mark all the remainig lines with a different marker string which were not marked before and last delete the first mark from the lines with correct number of delimiters. What remains are the lines which have not a correct number of delimiters and so are marked with *** at start of the line.

Maybe it would be a good idea to write a macro or script which asks the user for the delimiter character and the correct number of delimiters per line and then run the search to mark all the lines with wrong number of delimiters. Such a macro or script could also handle the exception that a delimiter character inside "..." should be ignored.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Lines with more a less of a special character

Postby phelan » Sun Aug 05, 2007 11:59 pm

Mofi wrote:Maybe it would be a good idea to write a macro or script which asks the user for the delimiter character and the correct number of delimiters per line and then run the search to mark all the lines with wrong number of delimiters. Such a macro or script could also handle the exception that a delimiter character inside "..." should be ignored.


Yes, this would be extremely usefull.

we deal with a lot of databases and regularly receive txt files of data with delimited fields.

being able to verify that these files are free from structure errors would be fantastic and save a lot of time.

I have been trying to modify your "counting tabs in a line" macro but no real progress to date.
User avatar
phelan
Newbie
 
Posts: 9
Joined: Mon Jul 02, 2007 11:00 pm


Return to Macros