Search

Recent Posts

Tags


« | Main | »

Migrating static html pages to a Web CMS/Blog

By Dale Reagan | November 13, 2008

If you have a number of old web sites or even a significant number of static HTML web pages then you may decide that moving them into a web based content management system (CMS) or even a blog might provide a benefit.

First, reasons NOT to migrate:

Some reasons to migrate pages:

  1. you have good/excellent content and you want to take advantage new web-related presentation solutions
  2. you want to use one tool for your content editing, layout and design as well as some level of content management
  3. you want to maintain a consistent feature set and presentation across your domains

Shameless Plug!: Need help with a similar HTML to CMS, small or large project?  (hourly or project based rates.)

Ok – how to proceed?

Unfortunately, most web-based CMS tools** (that I have reviewed) offer limited, if any, static HTML page import/migration options. The problem is that these CMS solutions are really Presentation Management Solutions; this is my view – CMS should provide an easy/simple means to both import and export data/pages/images/whatever.  Drupal, CMS Made Simple, WordPress and other solutions usually do offer some level of ‘page/node’ import – but, there are limited options so it’s usually a page at a time; some automation is definitely needed.  Note that there are quite a number of plugins for various types of data import, but, I have not located a simple tool to handle importing multiple static HTML pages or even an entire tree of such pages.

When you start your hunt (run some search engine queries) for a solution you will (most likely) find a number of discussions suggesting:

Hmm, none of the preceding solutions work for me (I have hundreds of pages to ‘convert’.)   If I can type data into these systems then surely I should be able to auto-input existing data.  We are using databases, right?  When you start digging you will most likely find that these solutions have abstracted data off into some far corner of the system (remember, it’s not about the data but about the presentation.)  If you have not already engaged a consultant/programmer to ‘convert/import’ your existing pages then you might proceed in this fashion:

Before you start  you need to consider/evaluate your static HTML files:

In my test case I have files that were created using Microsoft Frontpage.  The pages utilize CSS and ‘shared borders’ as well as custom header and footer sections.  I want to extract the page body as well as retain some meta-tags (<title>, <keywords>, etc.)  Since these pages were created following a standard layout it is simple to extract the components that I want to retain.  Ok, I have my raw input pages – how do I get them into the CMS?

Using 20 input pages as a test I did have a small level of success trying CMS Made Simple–> Import_Content (a nice plugin!); with this module you can establish page relationships as well as apply some formatting on input, but, if you are importing multiple pages then they will all be tied to one page (imagine selecting a menu option and then getting a drop-down list that contains hundreds of items – not a solution for my project.)  At this point, what is missing is some way to retain overall page structure/relationships (or create new relationships on the fly) while importing the static HTML pages.

Data Relationships (mapping static HTML into your CMS database)

For your CMS/Blogging solution of choice you will need to acquire some level of understanding of both the structure of your static HTML data and how your CMS stores it’s data; at that point you can develop a mapping strategy to move your static pages into the CMS system.  It sure would be useful if the CMS included some documentation on the database structure; I did not locate any so time for code digging.  WAIT! I found one, possibly useful post about how to approach automating import into Drupal.  On his blog, Adam Smith proposes a two step process to import pages into Drupal 6:

  1. create a file that contains your web page layout structure definition and then
  2. run a PHP script that both reads your layout specification and then plugs your static pages into the DB – looks promising!

Step one establishes the structure needed to avoid the import->layout problem I noted above.  Unfortunately, as written the import script did not work on my system…  However! it does provides some guidance into the Drupal ‘node/menu’ structure.  Extracting a PHP code snippet we see:

// create the page
      $node = new StdClass();
      $node->uid = 1;
      $node->type = 'page';
      $node->status = 1; // published
      $node->promote = 0; // don't promote to front page
      $node->path = $path; // ?q=path
      $node->format=3; // full HTML
      $node->title = $title;
      $node->body = "; // add later
      node_save($node);

      $parentLevel = $level-1;
      $parentLevelInfo =& $levels[$parentLevel];

// create the menu item
      $menuItem = array();
      $menuItem['plid'] = $parentLevelInfo[0];
      $parentLevelInfo[1]++;
      $menuItem['weight']=$parentLevelInfo[1];
      $menuItem['link_path']='node/' . $node->nid;
      $menuItem['link_title']=$label;
      $menuItem['type']=118; // see includes/menu.inc
      menu_link_save($menuItem);

A quick review of the above code and we can see many of the database elements used for Drupal ‘nodes’ along with some guidance on the node path/menu structure.  In my case I want to use someting like:

Taking this a bit further I install a Drupal plugin and  ‘export a node‘ and get this output from a test Blog and Page posts:

node(code(
‘nid’ => NULL,
‘type’ => ‘blog’,
‘language’ => ”,
‘uid’ => ‘1’,
‘status’ => ‘1’,
‘created’ => NULL,
‘changed’ => ‘1226355506’,
‘comment’ => ‘2’,
‘promote’ => ‘1’,
‘moderate’ => ‘0’,
‘sticky’ => ‘0’,
‘tnid’ => ‘0’,
‘translate’ => ‘0’,
‘vid’ => NULL,
‘revision_uid’ => ‘1’,
‘title’ => ‘Test blog entry’,
‘body’ => ‘This is a test blog entry
This is a test blog entry
This is a test blog entry
This is a test blog entry
‘,
‘teaser’ => ‘This is a test blog entry
This is a test blog entry
This is a test blog entry
This is a test blog entry
‘,
‘log’ => ”,
‘revision_timestamp’ => ‘1226355506’,
‘format’ => ‘1’,
‘name’ => ‘dale’,
‘picture’ => ”,
‘data’ => ‘a:0:{}’,
‘last_comment_timestamp’ => ‘1226355506’,
‘last_comment_name’ => NULL,
‘comment_count’ => ‘0’,
‘taxonomy’ =>
array (
),
‘files’ =>
array (
),
‘menu’ => NULL,
‘path’ => NULL,
))
node(code(
‘nid’ => NULL,
‘type’ => ‘page’,
‘language’ => ”,
‘uid’ => ‘1’,
‘status’ => ‘1’,
‘created’ => NULL,
‘changed’ => ‘1226356193’,
‘comment’ => ‘0’,
‘promote’ => ‘0’,
‘moderate’ => ‘0’,
‘sticky’ => ‘0’,
‘tnid’ => ‘0’,
‘translate’ => ‘0’,
‘vid’ => NULL,
‘revision_uid’ => ‘1’,
‘title’ => ‘Test PAGE’,
‘body’ => ‘Thi sis a test page.
Thi sis a test page.
Thi sis a test page.
Thi sis a test page.’,
‘teaser’ => ‘Thi sis a test page.
Thi sis a test page.
Thi sis a test page.
Thi sis a test page.’,
‘log’ => ”,
‘revision_timestamp’ => ‘1226356193’,
‘format’ => ‘1’,
‘name’ => ‘dale’,
‘picture’ => ”,
‘data’ => ‘a:0:{}’,
‘last_comment_timestamp’ => ‘1226356193’,
‘last_comment_name’ => NULL,
‘comment_count’ => ‘0’,
‘taxonomy’ =>
array (
),
‘files’ =>
array (
),
‘menu’ => NULL,
‘path’ => NULL,
))

Using the above layout it would be relatively simple to create a script to re-format the static HTML files with the structure used by the Import/Export plugins available for the version of Drupal that I am using – and then import a page at a time…  Using something like Adam’s script mentioned above (written in your preferred language) would be one approach to automating the movement of the data into the system.  You might also consider using xmlrpc tools – which work really well when moving data from one database to another database.

Note that you *should* be able to locate the Drupal database Schema for each table in the database that you are using; look under the module folders where data structures are declared, i.e.  ~/modules/node/node.install. A partial listing from the node.install file:


function node_schema() {
$schema['node'] = array(
'description' => t('The base table for nodes.'),
'fields' => array(
'nid' => array(
'description' => t('The primary identifier for a node.'),
'type' => 'serial',
'unsigned' => TRUE,
'not null' => TRUE),
'vid' => array(
'description' => t('The current {node_revisions}.vid version identifier.'),
'type' => 'int',
'unsigned' => TRUE,
'not null' => TRUE,
'default' => 0),
'type' => array(
'description' => t('The {node_type}.type of this node.'),
'type' => 'varchar',
'length' => 32,
'not null' => TRUE,
'default' => ''),

In addition to examining the actual database tables you can start a command shell and then ‘cd’ to your Drupal install folder and use the command below to locate additional files for review.

grep -li schema modules/*/*install (note that I added the numbering in WordPress and your list should vary since it will depend upon which modules you have installed…)

  1. modules/aggregator/aggregator.install
  2. modules/block/block.install
  3. modules/blogapi/blogapi.install
  4. modules/book/book.install
  5. modules/comment/comment.install
  6. modules/contact/contact.install
  7. modules/datasync/datasync.install
  8. modules/dblog/dblog.install
  9. modules/docapi/docapi.install
  10. modules/filter/filter.install
  11. modules/forum/forum.install
  12. modules/job_queue/job_queue.install
  13. modules/locale/locale.install
  14. modules/menu/menu.install
  15. modules/node_import/node_import.install
  16. modules/node/node.install
  17. modules/openid/openid.install
  18. modules/poll/poll.install
  19. modules/profile/profile.install
  20. modules/search/search.install
  21. modules/statistics/statistics.install
  22. modules/system/system.install
  23. modules/taxonomy/taxonomy.install
  24. modules/trigger/trigger.install
  25. modules/update/update.install
  26. modules/upload/upload.install
  27. modules/user/user.install
  28. modules/views/views.install

The ideal solution for this data-migration/import task would be one that was part of the content management system – please let me know if you find (or write) one!   The simplest solution will be to use SQL statements and load the data into appropriate tables – the only problem with this approach is that you loose any auto-magic data tagging that the CMS might be doing (i.e. updating counters, indexes, secondary files, etc.)

** – If I were reviewing commercial CMS solutions I would expect to find data import/conversion tools as standard features…

Update – there is a tool (module) for Druapl 5.x that might provide a round-about solution for importing static HTML into Drupal 6.

  1. install, configure Drupal 5.x
  2. install, configure the module Html_Import
  3. import your static pages, review and refine as desired
  4. install, configure Drupal 6.x
  5. following the documented migration steps for moving from Druapl 5.x to Drupal 6.x

In my limited test I encountered a number of problems mostly having to do with PHP XML tools (required at the OS level) so I did not move past step 3 above.  In general, wonderful as they might be, many Open Source solutions include the use of myriad other Open Source solutions.  The combinantion of all of these variables may make the Open Source approach more time consuming than results producing – just something to consider when weighing the use of Open Source and commercial solutions; when tackling somewhat complex problems – if you have adequate resources to put into an effort AND if there is a long term benefit then Open Source can be a win-win.  If your needs are simple/basic then Open Source solutions are are a win as soon as you start using them.

From my chair (FMC), the fastest way to import static HTML pages is to simply create SQL statements to load the data into your combination of frontend/database (Drupal, CMSMS, WordPress, etc.)  If needed, you could follow up the raw data import with SQL statements to update counters and indexes…

Topics: Computer Technology, Web Problem Solving, Web Site Conversions, Wordpress Software | Comments Off on Migrating static html pages to a Web CMS/Blog

Comments are closed.


________________________________________________
YOUR GeoIP Data | Ip: 73.21.121.1
Continent: NA | Country Code: US | Country Name: United States
Region: | State/Region Name: | City:
(US only) Area Code: 0 | Postal code/Zip:
Latitude: 38.000000 | Longitude: -97.000000
Note - if using a mobile device your physical location may NOT be accurate...
________________________________________________

Georgia-USA.Com - Web Hosting for Business
____________________________________