stripformat.pl

by @jehiah on 2005-01-14 14:12UTC
Filed under: All , HTML , Articles , CSS

stripformat.pl is a script which will remove all the ugly MS FrontPage and MS Word HTML. It does not strip all the html code (though that is part of it), it strips the style. Things like <o:p> or class=MsoNormal etc etc …

I originally developed this off of a script written by Duramecho titled StripFormattingFromWordGeneratedHtml.pl. The scope of what I wanted was a little different, so it is more interactive via command line, and is more geared to removing all styling and formating from a page than just Word generated HTML.

This script will :

  • Remove all linked stylesheets
  • Remove all div blocks
  • Remove all class="Mso*"
  • Remove all <font> tags
  • Remove all <o:p> and <city:state> type tags
  • Remove all <span> tags
  • Remove all <style> blocks and style="..." attributes
  • Remove all <script> blocks
  • Remove empty tags ie: <b></b>
  • Remove the id=Autonumber1 from table tags
  • Allow for insertion of information in the head block, at the beginning of the body, and end of the body (template stuff)
  • Preserved a footer div block from deletion
  • Add a Doctype
  • Check for a FrontPage signature before executing

Download the version 4 stripformat.pl script

Subscribe via RSS ı Email
Jehiah Czebotar