Character manipulation - Resetting an individual bit

Help with writing and running scripts

Character manipulation - Resetting an individual bit

Postby schofield860 » Mon Apr 28, 2008 7:10 am

Hi,

Firstly, this is not so much of a find / replace question but more a character manipulation question so apologies if it's posted in the wrong place.

I have a number of files produced from a data feed where traditional 7-bit ASCII charcters are sent through but the top-bit (the 8th bit) is being used / abused to contain a flag. The engine that processes this data is aware of the flag and handles the data as appropriate, but the abuse of this bit flag makes the data unreadable to humans.

E.g.
The ASCII character for A is 0x41 (65) - 1000001
However in the files I have this may come through as
0x41 (65) - 01000001
or 0xC1 (193) - 11000001

To make the data human readable I need to be able to reset this top-bit to 0.

Does anyone know of a simple way to do this or should I be looking to create a script to do this, or am I in the custom tools arena?

Thanks,

Jon
schofield860
Newbie
 
Posts: 2
Joined: Mon Apr 28, 2008 6:52 am

Re: Character manipulation - Resetting an individual bit

Postby jorrasdk » Tue Apr 29, 2008 6:19 am

Hi schofield860 - this is my suggestion for a script that will remove bit 8 - it may not be particular fast if you are working on very large files though. Also You will need UE13 or above. Have fun!

Code: Select all
// This script will remove bit 8 in the active file.
// Works only on DOS files (not UTF/Unicode)

// Use perl regular expressions
UltraEdit.perlReOn();

// Start from the top
UltraEdit.activeDocument.top();

// Set up search defaults:
UltraEdit.activeDocument.findReplace.matchCase=false;
UltraEdit.activeDocument.findReplace.matchWord=false;
UltraEdit.activeDocument.findReplace.regExp=true;
UltraEdit.activeDocument.findReplace.searchDown=true;
UltraEdit.activeDocument.findReplace.searchInColumn=false;

// Search for all 8 bit chars 0x80-0xFF
while (UltraEdit.activeDocument.findReplace.find("[\\x80-\\xFF]")) {
   // get selected char
   var char = UltraEdit.activeDocument.selection;
   
   // remove bit 8
   var newChar = to7Bit(char);
   
   // Write the 7 bit char back into the document:
   UltraEdit.activeDocument.write(newChar);
}

// Return to the top
UltraEdit.activeDocument.top();

// inline function that removes bit 8 in the input line
function to7Bit(char) {
   // Get decimal charcode
   var dec = char.charCodeAt(0);
   
   // remove bit 8 using XOR bit operation (100000000 = 128 dec)
   var decxor = dec ^ 128;

   // return new char converting the decimal code with fromCharCode
   return String.fromCharCode(decxor);
}
User avatar
jorrasdk
Master
Master
 
Posts: 275
Joined: Mon Mar 19, 2007 11:00 pm
Location: Denmark

Re: Character manipulation - Resetting an individual bit

Postby schofield860 » Wed Apr 30, 2008 6:01 am

Thanks for your quick reply (actually forgot to check back until today).

Not to bothered about the performance as it's not for a live processing environment, just to enable me to read the data.

I'm going to have a play with the script and will let you know how I get on. (I will try and add some code to check the underlying file is the correct format etc before proceeding - this will be a good excuse to learn a bit more about UE's scripting.)

Thanks again,

Jon
schofield860
Newbie
 
Posts: 2
Joined: Mon Apr 28, 2008 6:52 am


Return to Scripts