If you are in need to migrate Drupal’s content data (articles, pages, events, etc.) to your own CMS or database structure then this article may be helpful for you.
Note however, that I have little to none knowledge of Drupal and that my MySQL experience is very limited. Therefore, information provided in this article may be useful, but on the other hand, I can write something wrong (even up to a complete bullshit) or mess things around.
Use this text with a caution and at your own risk. You have been warned. No money back guaranteed! :>
First of all, I have been using Drupal for a very, very limited period of time (few hours at most). And I don’t know nearly anything about it. I was faced with one-time task of ripping content from one of simple blog based on Drupal and putting it into database, that powered my own version of that blog. There’s even more. I have never actually touched Drupal installation, from which I was importing content data. I was operating purely on database dump provided by site admin.
Second of all, I have very limited knowledge of MySQL, as I’ve been using active records in most of my applications to access data from database. Don’t even think about evaluating quality of my queries, as I can already tell you, that you haven’t seen worse in your entire life! :> I limited this part of my job to “hey, it’s working and it is giving us results, we’ve been expecting, right?” level. Sorry.
Tables and SQL
Main page content
Of course, we’re talking here only about content data, not about entire, huge database that stands behind Drupal installations.
Drupal-based site’s content is spread among few tables, while most simple blogs holds it in one table. That’s the biggest problem, this article is trying to fight with.
In Drupal installation, that I’ve been working on, all database tables were named in pattern
[sitename]_[tablename]. For clarity, I removed
[sitename] part from following examples. Therefore, you’ll find
field_data_body table name here, which actually corresponds to
johndoe_field_data_body table, if your installation is named
Here is an example of relation map, i.e. how to map data in your application, stored entirely in one table, to data in Drupal, spread among few tables:
- main content:
- is it on front page:
- is it on top of lists:
- creation date:
- last change date:
- start date (for events):
- end date (for events):
- slug, short or alias used in pretty URLs:
- content language:
Each element (entity, node, content item, etc.) is referenced by:
- textual (!) in
url_aliastable (i.e. string
In addition to above,
taxonomy_* tables holds information about tags (and other taxonomy elements), but I skipped that part (tags) in my blog.
There’s entirely different story about importing users (and hases of their passwords), as you need to have them before importing content, since you need to provide uid for each content element. You’re lucky, if you don’t use user-oriented items in your blog or if you import all items attached to single user (fixed uid). I was lucky to import only user e-mails, as my application uses LDAP-based login. If you have to import them with their original passwords, then these two articles may be a good start:
- What is Drupal’s default password encryption method? at Stack Overflow,
user_hash_passworddescription in Drupal 7 API.
Body and title
Having Drupal-based site’s database structure for content determined, I started crafting join queries to rip data from many tables and put it into single structure in phpMyAdmin, which could later be exported as PHP array.
I started easily, by fetching all pages in my source blog and joining together title with body:
SELECT n.title, b.body_value AS body FROM node AS n INNER JOIN field_data_body AS b ON b.entity_id = n.nid WHERE n.type = 'page' GROUP BY n.nid
First pitfall. Above query returned 26 records in my case, while it should 28 (there were 28 pages in my source page). Quite normal, as it uses inner join, which returns intersection of all joined columns (details). This means, that it does not returns rows from table
n, which has no corresponding element in table
INNER JOIN to
LEFT JOIN fixed problem.
Since I decided to import each type of content in a separate migration and since Drupal distinguishes content types using strings, while my application uses numbers (references to proper dictionary), I decided to add
type column to my result set as fixed value. To achieve this, I had to add
'3' AS type to
SELECT section of my query.
The same thing was with categories, not used in my source page and therefore set to fixed value of
This way, my select part looked like this:
SELECT '3' AS type, n.title, b.body_value AS body, '1' AS category
With rest part of query unchanged.
Other columns copied directly
Now, I took my hands on other columns, that could be fetched from using my join query directly — i.e. those columns, for which I only change name or even leave untouched. That included:
sticky (columns with changed name) plus
status (copied without even changing name).
I also wanted to include user, which created particular element. Drupal stores users in
users table. But, since my destination application required only user ID, I could treat
uid column as last copied directly (with name change), that does not require extra join.
This required to extend
SELECT part of my query to:
SELECT n.uid AS author_id, '3' AS type, n.status, n.title, b.body_value AS body, '1' AS category, n.promote AS featured, n.sticky AS recommended, n.created AS creation_date, n.changed AS modification_date, n.language
Second pitfall and second thing to look for. My source table stores dates (
TIMESTAMP type field, while my destination table as
DATETIME type field. Quick search revealed, that
FROM_UNIXTIME function should do the trick here. Therefore, I changed:
n.created AS creation_date, n.changed AS modification_date,
part in my
SELECT clause to:
FROM_UNIXTIME(n.created) AS creation_date, FROM_UNIXTIME(n.changed) AS modification_date,
At this point, my query looked like this:
SELECT n.uid AS author_id, '3' AS type, n.status, n.title, b.body_value AS body, '1' AS category, n.promote AS featured, n.sticky AS recommended, FROM_UNIXTIME(n.created) AS creation_date, FROM_UNIXTIME(n.changed) AS modification_date, n.language FROM node AS n LEFT JOIN field_data_body AS b ON b.entity_id = n.nid WHERE n.type = 'page' GROUP BY n.nid
Adding “pretty” URL
Last step (for non-event content type) was to add URL alias (slug, short) used to generate “pretty” URL to particular item. This step required another join, because this data is stored in
url_alias table (
alias field) in Drupal. However, the problem in this step was, that Drupal identifies nodes in
url_alias table using strings (
source column; like
node/131) not integer identifiers.
After another digging, I found out, that nice solution to this would be using
CONCAT function for concatenating strings:
u.source = CONCAT('node/', n.nid)
With this, second join in my query looked like this:
LEFT JOIN url_alias AS u ON u.source = CONCAT('node/', n.nid)
And the entire query (final version for importing pages):
SELECT n.uid AS author_id, '3' AS type, n.status, n.title, b.body_value AS body, '1' AS category, n.promote AS featured, n.sticky AS recommended, FROM_UNIXTIME(n.created) AS creation_date, FROM_UNIXTIME(n.changed) AS modification_date, u.alias AS short, n.language FROM node AS n LEFT JOIN url_alias AS u ON u.source = CONCAT('node/', n.nid) LEFT JOIN field_data_body AS b ON b.entity_id = n.nid WHERE n.type = 'page' GROUP BY n.nid
Unfortunately, this point revealed the biggest mistake in designing database structure and application logic, I made in past few years. Some of youmay already see this.
I have assumed, that each content in my new application will have one URL alias an each change to it (no matter, whether manual or automatic) will be overwriting last alias. Can you imagine this? Someone mistakenly changes alias (or even worse — it changes automatically with each title change) and all the links on external pages or all the user browser bookmarks suddenly becomes invalid! Madness!
To the migration
To convert each of above query into Yii migration, we have to execute it, export result set as PHP array and tune it up a little bit. This part is covered by my [another article](Converting database structure or .sql file into Yii migration.txt).
For importing articles, I used exactly the same query as above, with only two exceptions:
'3' AS type,replaced by
'4' AS type,in
n.type = 'page'replaced by
n.type = 'article'in
All other steps reminded without any change.
First step into writing Yii migration for importing Drupal events was equal as above:
'4' AS type,replaced by
'5' AS type,in
n.type = 'article'replaced by
n.type = 'event'in
However, for events we need another (third) join, because event’s start and end dates are stored in
field_data_field_date table. Therefore, I needed to add:
LEFT JOIN field_data_field_date AS d ON d.entity_id = n.nid
as last (third) join, and:
d.field_date_value AS event_start_date, d.field_date_value2 AS event_end_date,
SELECT part. Notice, that we don’t use
FROM_UNIXTIME here, because event’s start and end date are stored as
field_data_field_date, not as
And the final query for ripping events from Drupal database is:
SELECT n.uid AS author_id, '5' AS type, n.status, n.title, b.body_value AS body, '1' AS category, n.promote AS featured, n.sticky AS recommended, FROM_UNIXTIME(n.created) AS creation_date, FROM_UNIXTIME(n.changed) AS modification_date, d.field_date_value AS event_start_date, d.field_date_value2 AS event_end_date, u.alias AS short, n.language FROM node AS n LEFT JOIN url_alias AS u ON u.source = CONCAT('node/', n.nid) LEFT JOIN field_data_body AS b ON b.entity_id = n.nid LEFT JOIN field_data_field_date AS d ON d.entity_id = n.nid WHERE n.type = 'event' GROUP BY n.nid
What to do next?
If you’re writing (as I do) applications in Yii2, you can export data to a
.php file and then use [these simple tricks](Converting database structure or .sql file into Yii migration.txt) to convert that file into Yii migration.
If you’re writing your code in different framework or in pure PHP, you can use
INSERT ... SELECT syntax to improt data directly to your own database / data structure. More information about this syntax, in case you haven’t used it before, can be found anywhere across the Internet, for example at Stack Overflow, at W3Schools or here. Have fun!