I could see now with some example lines of real data that the expression
.*?, was not good in all my previous posts. This is a non-greedy expression, but it is nevertheless too greedy in case of multiple values in a line are empty values.
Much better is the expression
[^,\r\n]*, which matches in any case only 1 data field. This expression means: Find zero or more characters which are whether a comma, nor a carriage return, nor a line-feed (negative character class) followed by a comma.
I modified the expressions in all posts above by the better one in case other forum members are reading here too.
The non marking group with the multiplier
{1} is not necessary for finding strings between first and second comma in a line as you already know.
The better search string for your task for deleting all lines with
"" (empty field in double quotes) between first and second comma is:
^[^,\r\n]*, *""
,.*\r\nAs in your examples lines there is a space character between first comma and the first double quote character of second value, the expression contains now also a
space character followed by an
asterisk. With this addition zero or more spaces are allowed between first comma and the double quote in second data field. So the lines 4 to 6 in the example below are found with this expression and deleted.
- Code: Select all
"Column1", "Column2", "Column3", "Column4"
"Line 2", "1", "", "Smith"
"Line 3", "1", "X", "Jones"
"Line 4","", "", ""
"Line 5", "", "", ""
"Line 6", "", "", ""
"Line 7", "1", "XYZ", "Walker"
If you want to delete the third value from all lines independent of the value (empty or not empty), you need a tagged regular expression.
The search string is for this task
^((?:[^,\r\n]*,){2})[^,\r\n]*, and the replace string is just
\1If you or anybody else want to know more about Perl regular expressions to better understand them, see the IDM power tips:
More useful links to webpages about Perl regular expressions can be found in the forum announcement
Readme for the Find/Replace/Regular Expressions forum