From Drupal-based blog to Yii-based application
If you are in need to migrate Drupal’s content data (articles, pages, events, etc.) to your own CMS or database structure then this article may be helpful for you.
Note however, that I have little to none knowledge of Drupal and that my MySQL experience is very limited. Therefore, information provided in this article may be useful, but on the other hand, I can write something wrong (even up to a complete bullshit) or mess things around.
Use this text with a caution and at your own risk. You have been warned. No money back guaranteed! :>
Contents
Disclaimers
First of all, I have been using Drupal for a very, very limited period of time (few hours at most). And I don’t know nearly anything about it. I was faced with one-time task of ripping content from one of simple blog based on Drupal and putting it into database, that powered my own version of that blog. There’s even more. I have never actually touched Drupal installation, from which I was importing content data. I was operating purely on database dump provided by site admin.
Second of all, I have very limited knowledge of MySQL, as I’ve been using active records in most of my applications to access data from database. Don’t even think about evaluating quality of my queries, as I can already tell you, that you haven’t seen worse in your entire life! :> I limited this part of my job to “hey, it’s working and it is giving us results, we’ve been expecting, right?” level. Sorry.
Tables and SQL
Main page content
Of course, we’re talking here only about content data, not about entire, huge database that stands behind Drupal installations.
Drupal-based site’s content is spread among few tables, while most simple blogs holds it in one table. That’s the biggest problem, this article is trying to fight with.
In Drupal installation, that I’ve been working on, all database tables were named in pattern [sitename]_[tablename]
. For clarity, I removed [sitename]
part from following examples. Therefore, you’ll find field_data_body
table name here, which actually corresponds to johndoe_field_data_body
table, if your installation is named johndoe
.
Here is an example of relation map, i.e. how to map data in your application, stored entirely in one table, to data in Drupal, spread among few tables:
- author:
contents.author_id
–>node.uid
, - type:
contents.type
–>node.type
, - status:
contents.status
–>node.status
, - title:
contents.title
–>node.title
, - introduction:
contents.lead
–>field_data_body.body_summary
, - main content:
contents.body
–>field_data_body.body_value
, - is it on front page:
contents.featured
–>node.promote
, - is it on top of lists:
contents.recommended
–>node.sticky
, - creation date:
contents.creation_date
–>node.created
, - last change date:
contents.modification_date
–>node.changed
, - start date (for events):
contents.event_start_date
–>field_data_field_date.field_date_value
, - end date (for events):
contents.event_end_date
–>field_data_field_date.field_date_value2
, - slug, short or alias used in pretty URLs:
contents.short
–>url_alias
, - content language:
contents.language
–>node.language
.
Each element (entity, node, content item, etc.) is referenced by:
nid
innode
table,entity_id
infield_data_*
like tables,- textual (!) in
url_alias
table (i.e. stringnode/12
stands fornid
orentity_id
=12
).
In addition to above, taxonomy_*
tables holds information about tags (and other taxonomy elements), but I skipped that part (tags) in my blog.
There’s entirely different story about importing users (and hases of their passwords), as you need to have them before importing content, since you need to provide uid for each content element. You’re lucky, if you don’t use user-oriented items in your blog or if you import all items attached to single user (fixed uid). I was lucky to import only user e-mails, as my application uses LDAP-based login. If you have to import them with their original passwords, then these two articles may be a good start:
- What is Drupal’s default password encryption method? at Stack Overflow,
- Function
user_hash_password
description in Drupal 7 API.
Body and title
Having Drupal-based site’s database structure for content determined, I started crafting join queries to rip data from many tables and put it into single structure in phpMyAdmin, which could later be exported as PHP array.
I started easily, by fetching all pages in my source blog and joining together title with body:
SELECT
n.title,
b.body_value AS body
FROM
node AS n
INNER JOIN
field_data_body AS b
ON
b.entity_id = n.nid
WHERE
n.type = 'page'
GROUP BY
n.nid
First pitfall. Above query returned 26 records in my case, while it should 28 (there were 28 pages in my source page). Quite normal, as it uses inner join, which returns intersection of all joined columns (details). This means, that it does not returns rows from table n
, which has no corresponding element in table b
. Chaning INNER JOIN
to LEFT JOIN
fixed problem.
Fixed values
Since I decided to import each type of content in a separate migration and since Drupal distinguishes content types using strings, while my application uses numbers (references to proper dictionary), I decided to add type
column to my result set as fixed value. To achieve this, I had to add '3' AS type
to SELECT
section of my query.
The same thing was with categories, not used in my source page and therefore set to fixed value of 1
.
This way, my select part looked like this:
SELECT
'3' AS type,
n.title,
b.body_value AS body,
'1' AS category
With rest part of query unchanged.
Other columns copied directly
Now, I took my hands on other columns, that could be fetched from using my join query directly — i.e. those columns, for which I only change name or even leave untouched. That included: created
, changed
, promote
and sticky
(columns with changed name) plus language
and status
(copied without even changing name).
I also wanted to include user, which created particular element. Drupal stores users in users
table. But, since my destination application required only user ID, I could treat uid
column as last copied directly (with name change), that does not require extra join.
This required to extend SELECT
part of my query to:
SELECT
n.uid AS author_id,
'3' AS type,
n.status,
n.title,
b.body_value AS body,
'1' AS category,
n.promote AS featured,
n.sticky AS recommended,
n.created AS creation_date,
n.changed AS modification_date,
n.language
Second pitfall and second thing to look for. My source table stores dates (n.created
and n.changed
) in TIMESTAMP
type field, while my destination table as DATETIME
type field. Quick search revealed, that FROM_UNIXTIME
function should do the trick here. Therefore, I changed:
n.created AS creation_date,
n.changed AS modification_date,
part in my SELECT
clause to:
FROM_UNIXTIME(n.created) AS creation_date,
FROM_UNIXTIME(n.changed) AS modification_date,
At this point, my query looked like this:
SELECT
n.uid AS author_id,
'3' AS type,
n.status,
n.title,
b.body_value AS body,
'1' AS category,
n.promote AS featured,
n.sticky AS recommended,
FROM_UNIXTIME(n.created) AS creation_date,
FROM_UNIXTIME(n.changed) AS modification_date,
n.language
FROM
node AS n
LEFT JOIN
field_data_body AS b
ON
b.entity_id = n.nid
WHERE
n.type = 'page'
GROUP BY
n.nid
Adding “pretty” URL
Last step (for non-event content type) was to add URL alias (slug, short) used to generate “pretty” URL to particular item. This step required another join, because this data is stored in url_alias
table (alias
field) in Drupal. However, the problem in this step was, that Drupal identifies nodes in url_alias
table using strings (source
column; like node/131
) not integer identifiers.
After another digging, I found out, that nice solution to this would be using CONCAT
function for concatenating strings:
u.source = CONCAT('node/', n.nid)
With this, second join in my query looked like this:
LEFT JOIN
url_alias AS u
ON
u.source = CONCAT('node/', n.nid)
And the entire query (final version for importing pages):
SELECT
n.uid AS author_id,
'3' AS type,
n.status,
n.title,
b.body_value AS body,
'1' AS category,
n.promote AS featured,
n.sticky AS recommended,
FROM_UNIXTIME(n.created) AS creation_date,
FROM_UNIXTIME(n.changed) AS modification_date,
u.alias AS short,
n.language
FROM
node AS n
LEFT JOIN
url_alias AS u
ON
u.source = CONCAT('node/', n.nid)
LEFT JOIN
field_data_body AS b
ON
b.entity_id = n.nid
WHERE
n.type = 'page'
GROUP BY
n.nid
Unfortunately, this point revealed the biggest mistake in designing database structure and application logic, I made in past few years. Some of youmay already see this.
I have assumed, that each content in my new application will have one URL alias an each change to it (no matter, whether manual or automatic) will be overwriting last alias. Can you imagine this? Someone mistakenly changes alias (or even worse — it changes automatically with each title change) and all the links on external pages or all the user browser bookmarks suddenly becomes invalid! Madness!
To the migration
To convert each of above query into Yii migration, we have to execute it, export result set as PHP array and tune it up a little bit. This part is covered by my [another article](Converting database structure or .sql file into Yii migration.txt).
Importing articles
For importing articles, I used exactly the same query as above, with only two exceptions:
'3' AS type,
replaced by'4' AS type,
inSELECT
part,n.type = 'page'
replaced byn.type = 'article'
inWHERE
part.
All other steps reminded without any change.
Importing events
First step into writing Yii migration for importing Drupal events was equal as above:
'4' AS type,
replaced by'5' AS type,
inSELECT
part,n.type = 'article'
replaced byn.type = 'event'
inWHERE
part.
However, for events we need another (third) join, because event’s start and end dates are stored in field_data_field_date
table. Therefore, I needed to add:
LEFT JOIN
field_data_field_date AS d
ON
d.entity_id = n.nid
as last (third) join, and:
d.field_date_value AS event_start_date,
d.field_date_value2 AS event_end_date,
to SELECT
part. Notice, that we don’t use FROM_UNIXTIME
here, because event’s start and end date are stored as DATETIME
in field_data_field_date
, not as TIMESTAMP
.
And the final query for ripping events from Drupal database is:
SELECT
n.uid AS author_id,
'5' AS type,
n.status,
n.title,
b.body_value AS body,
'1' AS category,
n.promote AS featured,
n.sticky AS recommended,
FROM_UNIXTIME(n.created) AS creation_date,
FROM_UNIXTIME(n.changed) AS modification_date,
d.field_date_value AS event_start_date,
d.field_date_value2 AS event_end_date,
u.alias AS short,
n.language
FROM
node AS n
LEFT JOIN
url_alias AS u
ON
u.source = CONCAT('node/', n.nid)
LEFT JOIN
field_data_body AS b
ON
b.entity_id = n.nid
LEFT JOIN
field_data_field_date AS d
ON
d.entity_id = n.nid
WHERE
n.type = 'event'
GROUP BY
n.nid
What to do next?
If you’re writing (as I do) applications in Yii2, you can export data to a .php
file and then use [these simple tricks](Converting database structure or .sql file into Yii migration.txt) to convert that file into Yii migration.
If you’re writing your code in different framework or in pure PHP, you can use INSERT ... SELECT
syntax to improt data directly to your own database / data structure. More information about this syntax, in case you haven’t used it before, can be found anywhere across the Internet, for example at Stack Overflow, at W3Schools or here. Have fun!