[html] convert all links to lowercase

Find, replace, find in files, replace in files, regular expressions

[html] convert all links to lowercase

Postby ultra » Thu Jul 20, 2006 3:13 pm

Hi
I'm the new webmaster of a website (not done by me) and I have to correct some BROKEN links DUE to the fact that they are in uppercase and the server (linux) is case sensitive.

Example of dead link:
Code: Select all
<a href ="../TEST/file.htm" class="super" target="_blank">


I want to find them and correct them manually.
My regular expression is:

Code: Select all
<a\shref\s=\s"[a-z0-9]*[A-Z]+"

but it is not working please may you help me to correct it?

Regards
User avatar
ultra
Basic User
Basic User
 
Posts: 33
Joined: Tue Jul 13, 2004 11:00 pm

Re: [html] convert all links to lowercase

Postby xilduq » Fri Jul 21, 2006 3:44 pm

First, make a backup using tar or whatever.

Since you're using linux, you should have Perl available. Below is a shell command which will:
1.) find all plain files, with the extension .html or .htm, in and below the directory /home/siteroot
2.) escape any non-letter characters, such as spaces, in the pathname,
3.) filter out those files which do not contain 'href',
4.) do an in-place, one-liner, Perl search-and-replace on <a href... tags and <link ...href... tags, but, only turning to lowercase that which lies between the href=(with optional quote) and the first occurence of either '>' or '?' or another quote

Note: a backup file, with the extension .bak, is created for each file

find /home/siteroot -type f \( -name '*.html' -o -name '*.htm' \) | perl -lne "print quotemeta" | xargs grep -li href | xargs perl -i'.bak' -pe 's/(<(?:a|link)(?:(?!href).)+href\s*=\s*[\x27"]?)([^>?\x27"]+)/$1\L$2/igs'

<eom>
User avatar
xilduq
Newbie
 
Posts: 3
Joined: Thu Jul 20, 2006 11:00 pm

Re: [html] convert all links to lowercase

Postby ultra » Mon Jul 24, 2006 7:47 am

Thanks for your answer
but I use LINUX only for the SERVER - I want the regular expression with ULTRAEDIT not with SED please
User avatar
ultra
Basic User
Basic User
 
Posts: 33
Joined: Tue Jul 13, 2004 11:00 pm

Re: [html] convert all links to lowercase

Postby Mofi » Mon Jul 24, 2006 8:17 am

Seems the Perl users are all on holidays. So I, the UltraEdit style user, was forced to find the Perl regex.

Following Perl regular expression replace would make every <a href="..." lowercase:

Find: <a[ \t\r\n]*href[ \t\r\n]*=[ \t\r\n]*"(.*)"
Replace: <a href="\L\1\E"

You should be able to modify it for <img src="..." and <link rel="..." href="..." alone.

Test input:

Code: Select all
<a href ="../TEST/file.htm" class="super" target="_blank">

<a href= "../TEST/file.htm" class="super" target="_blank">

<a href="../TEST/file.htm" class="super" target="_blank">

<a href = "../TEST/FILE.HTM" class="super" target="_blank">

<a
  href ="../TEST/file.htm" class="super" target="_blank">

<a href =
"../TEST/file.htm" class="super" target="_blank">


Output of the replace:

Code: Select all
<a href="../test/file.htm" class="super" target="_blank">

<a href="../test/file.htm" class="super" target="_blank">

<a href="../test/file.htm" class="super" target="_blank">

<a href="../test/file.htm" class="super" target="_blank">

<a href="../test/file.htm" class="super" target="_blank">

<a href="../test/file.htm" class="super" target="_blank">


But maybe follwing would be better:

Find: [ \t\r\n]*href[ \t\r\n]*=[ \t\r\n]*"(.*)"
Replace: spacehref="\L\1\E"

Why? It works also for href="..." in <link> and for <a name="..." href="...">.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4064
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to Find/Replace/Regular Expressions