From Drupal-based blog to Yii-based application

If you are in need to migrate Drupal’s content data (articles, pages, events, etc.) to your own CMS or database structure then this article may be helpful for you.

Note however, that I have little to none knowledge of Drupal and that my MySQL experience is very limited. Therefore, information provided in this article may be useful, but on the other hand, I can write something wrong (even up to a complete bullshit) or mess things around.

Use this text with a caution and at your own risk. You have been warned. No money back guaranteed! :>

Disclaimers

First of all, I have been using Drupal for a very, very limited period of time (few hours at most). And I don’t know nearly anything about it. I was faced with one-time task of ripping content from one of simple blog based on Drupal and putting it into database, that powered my own version of that blog. There’s even more. I have never actually touched Drupal installation, from which I was importing content data. I was operating purely on database dump provided by site admin.

Second of all, I have very limited knowledge of MySQL, as I’ve been using active records in most of my applications to access data from database. Don’t even think about evaluating quality of my queries, as I can already tell you, that you haven’t seen worse in your entire life! :> I limited this part of my job to “hey, it’s working and it is giving us results, we’ve been expecting, right?” level. Sorry.

Tables and SQL

Main page content

Of course, we’re talking here only about content data, not about entire, huge database that stands behind Drupal installations.

Drupal-based site’s content is spread among few tables, while most simple blogs holds it in one table. That’s the biggest problem, this article is trying to fight with.

In Drupal installation, that I’ve been working on, all database tables were named in pattern [sitename]_[tablename]. For clarity, I removed [sitename] part from following examples. Therefore, you’ll find field_data_body table name here, which actually corresponds to johndoe_field_data_body table, if your installation is named johndoe.

Here is an example of relation map, i.e. how to map data in your application, stored entirely in one table, to data in Drupal, spread among few tables:

  • author: contents.author_id –> node.uid,
  • type: contents.type –> node.type,
  • status: contents.status –> node.status,
  • title: contents.title –> node.title,
  • introduction: contents.lead –> field_data_body.body_summary,
  • main content: contents.body –> field_data_body.body_value,
  • is it on front page: contents.featured –> node.promote,
  • is it on top of lists: contents.recommended –> node.sticky,
  • creation date: contents.creation_date –> node.created,
  • last change date: contents.modification_date –> node.changed,
  • start date (for events): contents.event_start_date –> field_data_field_date.field_date_value,
  • end date (for events): contents.event_end_date –> field_data_field_date.field_date_value2,
  • slug, short or alias used in pretty URLs: contents.short –> url_alias,
  • content language: contents.language –> node.language.

Each element (entity, node, content item, etc.) is referenced by:

  • nid in node table,
  • entity_id in field_data_* like tables,
  • textual (!) in url_alias table (i.e. string node/12 stands for nid or entity_id = 12).

Tags, users and their passwords

In addition to above, taxonomy_* tables holds information about tags (and other taxonomy elements), but I skipped that part (tags) in my blog.

There’s entirely different story about importing users (and hases of their passwords), as you need to have them before importing content, since you need to provide uid for each content element. You’re lucky, if you don’t use user-oriented items in your blog or if you import all items attached to single user (fixed uid). I was lucky to import only user e-mails, as my application uses LDAP-based login. If you have to import them with their original passwords, then these two articles may be a good start:

Body and title

Having Drupal-based site’s database structure for content determined, I started crafting join queries to rip data from many tables and put it into single structure in phpMyAdmin, which could later be exported as PHP array.

I started easily, by fetching all pages in my source blog and joining together title with body:

SELECT
    n.title,
    b.body_value AS body
FROM
    node AS n
INNER JOIN
    field_data_body AS b
ON
    b.entity_id = n.nid
WHERE
    n.type = 'page'
GROUP BY
    n.nid

First pitfall. Above query returned 26 records in my case, while it should 28 (there were 28 pages in my source page). Quite normal, as it uses inner join, which returns intersection of all joined columns (details). This means, that it does not returns rows from table n, which has no corresponding element in table b. Chaning INNER JOIN to LEFT JOIN fixed problem.

Fixed values

Since I decided to import each type of content in a separate migration and since Drupal distinguishes content types using strings, while my application uses numbers (references to proper dictionary), I decided to add type column to my result set as fixed value. To achieve this, I had to add '3' AS type to SELECT section of my query.

The same thing was with categories, not used in my source page and therefore set to fixed value of 1.

This way, my select part looked like this:

SELECT
    '3' AS type,
    n.title,
    b.body_value AS body,
    '1' AS category

With rest part of query unchanged.

Other columns copied directly

Now, I took my hands on other columns, that could be fetched from using my join query directly — i.e. those columns, for which I only change name or even leave untouched. That included: created, changed, promote and sticky (columns with changed name) plus language and status (copied without even changing name).

I also wanted to include user, which created particular element. Drupal stores users in users table. But, since my destination application required only user ID, I could treat uid column as last copied directly (with name change), that does not require extra join.

This required to extend SELECT part of my query to:

SELECT
    n.uid AS author_id,
    '3' AS type,
    n.status,
    n.title,
    b.body_value AS body,
    '1' AS category,
    n.promote AS featured,
    n.sticky AS recommended,       
    n.created AS creation_date,
    n.changed AS modification_date,
    n.language

Second pitfall and second thing to look for. My source table stores dates (n.created and n.changed) in TIMESTAMP type field, while my destination table as DATETIME type field. Quick search revealed, that FROM_UNIXTIME function should do the trick here. Therefore, I changed:

n.created AS creation_date,
n.changed AS modification_date,

part in my SELECT clause to:

FROM_UNIXTIME(n.created) AS creation_date,
FROM_UNIXTIME(n.changed) AS modification_date,

At this point, my query looked like this:

SELECT
    n.uid AS author_id,
    '3' AS type,
    n.status,
    n.title,
    b.body_value AS body,
    '1' AS category,
    n.promote AS featured,
    n.sticky AS recommended,
    FROM_UNIXTIME(n.created) AS creation_date,
    FROM_UNIXTIME(n.changed) AS modification_date,
    n.language
FROM
    node AS n
LEFT JOIN
    field_data_body AS b
ON
    b.entity_id = n.nid
WHERE
    n.type = 'page'
GROUP BY
    n.nid

Adding “pretty” URL

Last step (for non-event content type) was to add URL alias (slug, short) used to generate “pretty” URL to particular item. This step required another join, because this data is stored in url_alias table (alias field) in Drupal. However, the problem in this step was, that Drupal identifies nodes in url_alias table using strings (source column; like node/131) not integer identifiers.

After another digging, I found out, that nice solution to this would be using CONCAT function for concatenating strings:

u.source = CONCAT('node/', n.nid)

With this, second join in my query looked like this:

LEFT JOIN
    url_alias AS u
ON
    u.source = CONCAT('node/', n.nid)

And the entire query (final version for importing pages):

SELECT
    n.uid AS author_id,
    '3' AS type,
    n.status,
    n.title,
    b.body_value AS body,
    '1' AS category,
    n.promote AS featured,
    n.sticky AS recommended,
    FROM_UNIXTIME(n.created) AS creation_date,
    FROM_UNIXTIME(n.changed) AS modification_date,
    u.alias AS short,
    n.language
FROM
    node AS n
LEFT JOIN
    url_alias AS u
ON
    u.source = CONCAT('node/', n.nid)
LEFT JOIN
    field_data_body AS b
ON
    b.entity_id = n.nid
WHERE
    n.type = 'page'
GROUP BY
    n.nid

Unfortunately, this point revealed the biggest mistake in designing database structure and application logic, I made in past few years. Some of youmay already see this.

I have assumed, that each content in my new application will have one URL alias an each change to it (no matter, whether manual or automatic) will be overwriting last alias. Can you imagine this? Someone mistakenly changes alias (or even worse — it changes automatically with each title change) and all the links on external pages or all the user browser bookmarks suddenly becomes invalid! Madness!

To the migration

To convert each of above query into Yii migration, we have to execute it, export result set as PHP array and tune it up a little bit. This part is covered by my [another article](Converting database structure or .sql file into Yii migration.txt).

Importing articles

For importing articles, I used exactly the same query as above, with only two exceptions:

  • '3' AS type, replaced by '4' AS type, in SELECT part,
  • n.type = 'page' replaced by n.type = 'article' in WHERE part.

All other steps reminded without any change.

Importing events

First step into writing Yii migration for importing Drupal events was equal as above:

  • '4' AS type, replaced by '5' AS type, in SELECT part,
  • n.type = 'article' replaced by n.type = 'event' in WHERE part.

However, for events we need another (third) join, because event’s start and end dates are stored in field_data_field_date table. Therefore, I needed to add:

LEFT JOIN
    field_data_field_date AS d
ON
    d.entity_id = n.nid

as last (third) join, and:

d.field_date_value AS event_start_date,
d.field_date_value2 AS event_end_date,

to SELECT part. Notice, that we don’t use FROM_UNIXTIME here, because event’s start and end date are stored as DATETIME in field_data_field_date, not as TIMESTAMP.

And the final query for ripping events from Drupal database is:

SELECT
    n.uid AS author_id,
    '5' AS type,
    n.status,
    n.title,
    b.body_value AS body,
    '1' AS category,
    n.promote AS featured,
    n.sticky AS recommended,
    FROM_UNIXTIME(n.created) AS creation_date,
    FROM_UNIXTIME(n.changed) AS modification_date,
    d.field_date_value AS event_start_date,
    d.field_date_value2 AS event_end_date,
    u.alias AS short,
    n.language
FROM
    node AS n
LEFT JOIN
    url_alias AS u
ON
    u.source = CONCAT('node/', n.nid)
LEFT JOIN
    field_data_body AS b
ON
    b.entity_id = n.nid
LEFT JOIN
    field_data_field_date AS d
ON
    d.entity_id = n.nid
WHERE
    n.type = 'event'
GROUP BY
    n.nid

What to do next?

If you’re writing (as I do) applications in Yii2, you can export data to a .php file and then use [these simple tricks](Converting database structure or .sql file into Yii migration.txt) to convert that file into Yii migration.

If you’re writing your code in different framework or in pure PHP, you can use INSERT ... SELECT syntax to improt data directly to your own database / data structure. More information about this syntax, in case you haven’t used it before, can be found anywhere across the Internet, for example at Stack Overflow, at W3Schools or here. Have fun!

Leave a Reply