Sort feature - removing duplicates if only line parts are identical

This forum is user-to-user based and not regularly monitored by IDM.
Please see the note at the top of this page on how to contact IDM.

Sort feature - removing duplicates if only line parts are identical

Postby dankanze » Tue Jul 15, 2008 12:14 pm

For several years now I have suggested a change to UltraEdit sort feature. Sort allows the removal of duplicate records but it only considers a full record match. It would be far more useful if the exclusion of duplicates could be made based on the sort key criteria.

Does anyone else see this as a useful enhancement? I have been asking IDM for several years now but they have not implemented. Perhaps if there was an interest among the user community this could get done.

Thanks
Dan
User avatar
dankanze
Newbie
 
Posts: 2
Joined: Tue Jul 15, 2008 12:09 pm

Re: Sort feature - removing duplicates if only line parts are identical

Postby pietzcker » Tue Jul 15, 2008 12:41 pm

I think that's something you could easily do in a macro (doing a sort and then a regex-search/replace to weed out duplicates in whatever way you choose) or a script (prompting the user for input as to which "rules" should apply for duplicate removal). Do you have an example of what you're trying to do?
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Sort feature - removing duplicates if only line parts are identical

Postby dankanze » Tue Jul 15, 2008 12:45 pm

I work with many different files and depending on the analysis I am doing, I may wish to remove dupes based on different columns. Currently I am using SAS to remove dupes. To me the implementation currently in UE is useless and would love to see it changed. UE is a great tool and I would like to stay in one environment rather than bounce between programs.
User avatar
dankanze
Newbie
 
Posts: 2
Joined: Tue Jul 15, 2008 12:09 pm

Re: Sort feature - removing duplicates if only line parts are identical

Postby sklad2 » Wed Jul 16, 2008 2:49 pm

I agree about having sort options that allow you to pick criteria and use that criteria to eliminate dups.
User avatar
sklad2
Advanced User
Advanced User
 
Posts: 59
Joined: Thu Mar 08, 2007 12:00 am

Re: Sort feature - removing duplicates if only line parts are identical

Postby jepqt » Wed Oct 15, 2008 11:08 am

I strongly agree that this is a major shortcoming of UE. If I find other software that does this Ill probably junk UE. It is obvious that those designing this product did not ask programmers for teir input on this. Also it is obvious that, since the company did not add your suggestion, it is NOT responsive to the user community. One should NOT have to write a script or anything to achieve deleting duplicate records based on sort key.
jepqt
Newbie
 
Posts: 3
Joined: Wed Oct 15, 2008 11:02 am

Re: Sort feature - removing duplicates if only line parts are identical

Postby hendo » Sun Apr 26, 2009 1:41 pm

could someone please help me write a macro which removes complete line if there are dupicates within Col21 - Col 81 ?
hendo
Newbie
 
Posts: 3
Joined: Sun Apr 26, 2009 1:40 pm

Re: Sort feature - removing duplicates if only line parts are identical

Postby pietzcker » Sun Apr 26, 2009 2:35 pm

What do you mean? Remove two consecutive lines if the contents of columns 21-81 are identical?
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Sort feature - removing duplicates if only line parts are identical

Postby hendo » Sun Apr 26, 2009 3:01 pm

I have a database file with my customers address, phonenumber etc.
in col 21-80 i have the company name. Some names are the same but the company has several addresses and but identical phone numbers.

I would like to remove all of the identical company names. leaving me with only one line for that company.

In plain: Locate and delete duplicated company names within col 21-81 along with deleting the the rest of the line. Thus leaving me with only one entry for that company rather than XX number of entries becouse of several addresses.


hope this makes sense :)
hendo
Newbie
 
Posts: 3
Joined: Sun Apr 26, 2009 1:40 pm

Re: Sort feature - removing duplicates if only line parts are identical

Postby pietzcker » Mon Apr 27, 2009 1:43 am

OK. I guess that means that "duplicate entries" within your file are not necessarily on consecutive lines.

The following macro (try it on a copy of your data first!) will do the following:

1. Ensure that the last line in the file is CRLF terminated.
2. Sort the file according to columns 21 and up (this step is not undoable!)
3. Remove all lines where columns 21-81 (both included, i. e. 61 characters!) are identical, leaving only the first occurence.

Code: Select all
InsertMode
ColumnModeOff
HexOff
Key Ctrl+END
IfColNumGt 1
"
"
EndIf
Top
SortAsc 21 -1 0 0 0 0 0 0
PerlReOn
Find RegExp "^(.{20}(.{61}).*)\r\n(.{20}\2.*\r\n)+"
Replace All "\1\r\n"


Tested on UE V15.00.0.1043
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Sort feature - removing duplicates if only line parts are identical

Postby hendo » Tue Apr 28, 2009 2:37 am

This seems to be working great. Thanks alot for your time :)
hendo
Newbie
 
Posts: 3
Joined: Sun Apr 26, 2009 1:40 pm

Re: Sort feature - removing duplicates if only line parts are id

Postby Mofi » Tue Jul 14, 2009 6:41 am

Starting with UE v15.10.0 it is possible to sort a file and remove lines when only line parts are identical, see power tip Advanced file sort for details.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4066
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to UltraEdit General Discussion