How can a script detect Base64?

Help with writing and running scripts

How can a script detect Base64?

Postby Bracket » Fri Jan 29, 2010 6:07 pm

I have text files that are a combination of plaintext and base64. I'm looking to write a script that will decode the base64 sections.

What I need to know is whether there is any way to programmatically detect them, so they can be decoded without touching the plaintext.


Does anyone know how I can do this?
User avatar
Bracket
Basic User
Basic User
 
Posts: 32
Joined: Fri Oct 26, 2007 11:00 pm

Re: How can a script detect Base64?

Postby Mofi » Sat Jan 30, 2010 11:21 am

Base64 encoding is mainly used by email applications and they store normally a header above embedded files encoded with base64. So you can search for that headers and evaluate them by the script to correct decode the base64 encoded block below each header.

In case your file does not contain such headers I suggest to use a regular expression find searching for strings not containg a space or tab character. Every found string larger than X bytes (let's say 100), is typically not a normal text and is therefore probably a base64 encoded block.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: How can a script detect Base64?

Postby Bracket » Sat Jan 30, 2010 5:40 pm

Unfortunately, not all the strings are that long. And shorter strings can be mistaken for email addresses.

I was able to get a serviceable result by limiting the search to the characters that Base64 uses:

(?:[A-Za-z0-9+/=]{20,})


So, I'll use that. Thanks.
User avatar
Bracket
Basic User
Basic User
 
Posts: 32
Joined: Fri Oct 26, 2007 11:00 pm

Re: How can a script detect Base64?

Postby rev0luci0n » Thu May 24, 2012 7:21 pm

A long shot but Bracket did you end up writing a script to do this?

Our document management system has cataloged a bunch of text files with base64 encoding, I need to write a script that can go through them all, and only convert the blocks that are base64 encoded, leaving the plain text alone.
rev0luci0n
Newbie
 
Posts: 1
Joined: Thu May 24, 2012 7:19 pm

Re: How can a script detect Base64?

Postby Bracket » Thu May 24, 2012 7:54 pm

rev0luci0n wrote:A long shot but Bracket did you end up writing a script to do this?

Actually, I did. It serves my needs perfectly. It's possible that you might need to tweak the RegEx strings for your own documents. But see how this works for you:

Code: Select all
var WorkingFile = UltraEdit.activeDocument;

// Function to execute a RegEx Find. Set CaseFlag to 1 for Case Sensitive, or 0 for not.
function findRegEx(SearchString, CaseFlag)
{
   UltraEdit.perlReOn();
   
   if (CaseFlag == 1)
   {
      WorkingFile.findReplace.matchCase = true;
   }
   else
   {
      WorkingFile.findReplace.matchCase = false;
   }
   
   WorkingFile.findReplace.regExp = true;
   WorkingFile.findReplace.find(SearchString);
}


function Base64Decode(SearchString)
{

   // Flag to indicate if the loop should be broken
   var BreakFlag = 0;
   
   WorkingFile.top();
   
   // Run the search and decode every found string
   do
   {
      findRegEx (SearchString, 0);
      
      // If it found a match, decode it.
      if (WorkingFile.isFound() == true)
      {
         WorkingFile.decodeBase64();
      }
      
      else
      {
         BreakFlag = 1;
      }
   
   } while (BreakFlag == 0);

   WorkingFile.top();
}


// ---------------------------------------------------------------------
// Main Execution
// ---------------------------------------------------------------------


Base64Decode("(?<=>>> 334 ).*");
Base64Decode("(?:[A-Za-z0-9+/=]{20,})");
User avatar
Bracket
Basic User
Basic User
 
Posts: 32
Joined: Fri Oct 26, 2007 11:00 pm

Re: How can a script detect Base64?

Postby Mofi » Fri May 25, 2012 1:12 am

Good script, Bracket. But it can be made faster with in a first step remove some lines by using boolean variables instead of integer variables:

Code: Select all
var WorkingFile = UltraEdit.activeDocument;

// Function to execute a RegEx Find. Set CaseFlag to true for Case Sensitive, or false for not.
function findRegEx(SearchString, CaseFlag)
{
   UltraEdit.perlReOn();
   WorkingFile.findReplace.matchCase = CaseFlag;
   WorkingFile.findReplace.regExp = true;
   WorkingFile.findReplace.find(SearchString);
}

function Base64Decode(SearchString)
{

   // Flag to indicate if the loop should be broken
   var BreakFlag = false;

   WorkingFile.top();

   // Run the search and decode every found string
   do
   {
      findRegEx (SearchString, false);

      // If it found a match, decode it.
      if (WorkingFile.isFound() == true)
      {
         WorkingFile.decodeBase64();
      }

      else
      {
         BreakFlag = true;
      }

   } while (BreakFlag == false);

   WorkingFile.top();
}

// ---------------------------------------------------------------------
// Main Execution
// ---------------------------------------------------------------------

Base64Decode("(?<=>>> 334 ).*");
Base64Decode("(?:[A-Za-z0-9+/=]{20,})");

And in a second step it can be once more speed up by removing even more lines (function calls) resulting finally in:

Code: Select all
var WorkingFile = UltraEdit.activeDocument;

function Base64Decode(SearchString)
{
   WorkingFile.top();
   UltraEdit.perlReOn();
   WorkingFile.findReplace.matchCase = false;
   WorkingFile.findReplace.regExp = true;
   while(WorkingFile.findReplace.find(SearchString))
   {
      WorkingFile.decodeBase64();
   }
   WorkingFile.top();
}

// ---------------------------------------------------------------------
// Main Execution
// ---------------------------------------------------------------------

Base64Decode("(?<=>>> 334 ).*");
Base64Decode("(?:[A-Za-z0-9+/=]{20,})");

Please note that I have executed whether the script written by Bracket nor the modified scripts above. So I don't know if they really work.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: How can a script detect Base64?

Postby Bracket » Fri May 25, 2012 1:26 am

Thanks for the suggestions. The boolean vs integer flags I can't imagine would make that much of a difference (or rather, it shouldn't). As for the second set of modifications, it's true that it can be optimized - I've written function libraries that I use all the time, so I end up putting my scripts together in a more modular format for rapid creation, rather than the absolute most streamlined execution. :)
User avatar
Bracket
Basic User
Basic User
 
Posts: 32
Joined: Fri Oct 26, 2007 11:00 pm


Return to Scripts