Giant file, column search help

Find, replace, find in files, replace in files, regular expressions

Giant file, column search help

Postby bsim » Fri Jan 23, 2009 2:49 pm

So I have a file that has 30,000+ columns. I want to count the number of lines that contain a non-blank value in column 25,000.

Using the search tool, I'm limited to xxxx values when searching by column, so I tried deleting enough columns (20000) to get my column to be searched under 9999 (5000).

However, it appears that UE still thinks the columns go out to 30,000+. When I try to count values in any column (up to 9999) all I get are "0 occurances" for any value (?, *), as if UE has not realized that I deleted those columns (not just the values within the column). Shouldn't UE renumber the columns when it deleted the values?

What's the best way to count (via column) when the column number to be searched is > 9999?
bsim
Newbie
 
Posts: 4
Joined: Fri Jan 23, 2009 2:19 pm

Re: Giant file, column search help

Postby pietzcker » Fri Jan 23, 2009 4:31 pm

I don't have a file like that to test on, but you could try searching for the Perl regular expression ^.{24999}\S (and use the "count all" button in the search dialog).
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Giant file, column search help

Postby bsim » Fri Jan 23, 2009 5:16 pm

Thanks! It looks like it's working so far...
bsim
Newbie
 
Posts: 4
Joined: Fri Jan 23, 2009 2:19 pm

Re: Giant file, column search help

Postby bsim » Fri Jan 23, 2009 8:33 pm

One more thing, is this command string-able? Like can I put 2 or more of these together to count multiple columns?

^.{24999}\S + ^.{25999}\S + ^.{26999}\S

??
bsim
Newbie
 
Posts: 4
Joined: Fri Jan 23, 2009 2:19 pm

Re: Giant file, column search help

Postby Mofi » Sat Jan 24, 2009 8:50 am

^.{24999}\S

means find a string beginning on start of a line with 24999 characters of any type except newline characters and the 25.000th character is not a white-space character (space, tab newline).

You should be able to combine such searches with an OR expression:

^(.{24999}\S|.{25999}\S|.{26999}\S)

But I have not tested if this is really possible.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4054
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Giant file, column search help

Postby pietzcker » Sat Jan 24, 2009 10:53 am

Mofi's expression will work if you want to match a line that has at least one non-blank character in column 25000, 26000 and/or 27000. It could be optimized a little by writing it as

^.{24999}(?:.{1000}){0,2}?\S

^.{24999} will match the first 24999 characters of the line.

(?:.{1000}){0,2}? will match zero, one or two occurences of 1000 consecutive characters, preferring as few as possible (that's the reason for the final ?, making the previous expression "lazy").

\S will then match a non-blank character.

If you want an "and"-evaluation, i. e. match only if there is a non-blank in all three positions, then you could use

^.{24999}\S(?:.{999}\S){2}
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Giant file, column search help

Postby bsim » Mon Jan 26, 2009 1:14 am

Thanks fellas, I'm making progress. And thank you for the color coding, it helps tremendously!

Now I think the last is still to combine counts. The above examples work close, but aren't getting exactly what I'm looking for.

To help explain, here's an example of what I'm doing, and what I'm looking for. As you can see, my data is columned, with space values where appropriate. Using pietzcker' example, I can search a column and count the populated fields. In this case "4" is correct:
Image

So I then go through and do the rest of the columns as well, and add them up (in this case, 4 + 4 + 2 + 1 = 11).

However, as I have 200 columns to search in 60 files, I would like to have a string that sums (a file at a time) this way:
Image

When I try mofi's example, it seems to stop at the first column count. Maybe because I have space values between the columns I want to count?

Thanks again!!!
bsim
Newbie
 
Posts: 4
Joined: Fri Jan 23, 2009 2:19 pm

Re: Giant file, column search help

Postby pietzcker » Mon Jan 26, 2009 2:52 am

I see. Well, Mofi's and my first regex will report "a match" if at least one of the three positions contains a non-blank character. Therefore you get one match even if the current line could match three times (strictly speaking: the regex aborts after the first successful match and doesn't even try the other options if the first one is already met).

The only thing you can do is use one regex for each position (^.{24999]\S, ^.{25999}\S etc.), apply them sequentially and add the results - just like you did. Since it appears to be too much work to do manually, you'd need a script. Now I don't know UE's JavaScript well enough, perhaps Mofi or jorrasdk can help.
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm


Return to Find/Replace/Regular Expressions