Adding PDF Conversion to Your WordPress Website

2010
04.08

This is the second part of a two-part series that shows you how to convert WordPress posts into PDFs without a plug-in or module. In the first part, you learned the important concepts for implementation, the required PHP class and a little about formulating the GET request which serves as an input to postcreator.php to grab the correct content from the MySQL database. In this final part, we will discuss how to completely integrate this application into your WordPress blogging/CMS platform. It is highly recommended that you read the first part to avoid confusion.

Importance of using proper title tag

In the first part, you learned how to formulate the GET request by extracting ID parameters in the URL that consist of the post ID, which will then be used by the PDF creator script to grab the necessary content from the MySQL database.

However, since most blogs use different permalink structures, the above method cannot be used. This leads us to the second and most important method of distinguishing post content; namely, the Title Tag.

In WordPress, you should have unique title tags. This helps not only your SEO (Google, for example recommends the use of unique title tags), but your website readers/users as well.

When every post title is unique, you can use this input to your PDF creator script to properly grab the correct post content from the MySQL database. But first, you need to have a correct title tag in your WordPress blog. Some plug-ins can do this job, but the recommended way is by using a PHP script (no plug-ins for simplicity).

The default title tag in WordPress (no plug-in enabled) is this (can be found in header.php):

<title><?php wp_title(‘&laquo;’, true, ‘right’); ?> <?php bloginfo(‘name’); ?></title>

Other related templates may use a slight variation of this. But the main problem is that the above title is not optimized for use. For example, if the actual title of the post is “Hello World!” using the above script, the title that can be viewed in the browser is this:

Test  >> Blog Archive >> Hello World!

For this PDF creator project, you need a post title that exactly corresponds to the post title stored in the WordPress MySQL database. The strategy is to use the SELECT MySQL command in the PDF creator PHP script to get the post content for the corresponding unique post title. Below is the screen shot of the WordPress MySQL database (wp_post table) for the most basic/default installation:

It shows that the post title is only “Hello World!” and in the same row, you can see the post content. So how can we formulate a correct title tag in WordPress to exactly respond to the same post title stored in the MySQL database?

The following is the recommended PHP script that will provide the desired title tag output:

<?php

if (is_front_page()) {

        echo “<title>This is your homepage title tag</title>”;

} elseif (is_single()) {

        echo “<title>”;

        wp_title(”,true);

        echo “</title>”;

} elseif (is_page()) {

        echo “<title>”;

        wp_title(”,true);

        echo “</title>”;

} elseif (is_archive()) {

        echo “<title>”;

        wp_title(”,true);

        echo “</title>”;

} elseif (is_404()) {

        echo ‘<title>Page not found:  Yourdomain.com</title>’;

} else {

        echo “<title>”;

        bloginfo(‘name’);

        echo “</title>”;

}

?>

Copy the above code and replace the default PHP code for the title tag, which is:

<title><?php wp_title(‘&laquo;’, true, ‘right’); ?> <?php bloginfo(‘name’); ?></title>

After that, set the correct title tag for your home page. You should edit “This is your homepage title” and replace it with your home page title tag. Also, in the 404 page “Page not found: Yourdomain.com,” replace it with your correct domain. So if your domain is “thisismydomain.com”, it will be:

Page not found: Thisismydomain.com

In WordPress documentation, you can use wp_title () as input to PDF creator script in order to grab the post title. To make this happen, you will not display it to the browser, but just use it as a variable in the single.php (post template, discussed later)

$titletag=trim(wp_title(”,false));

We can use this as input to the PDF creator script, but first you need to create your CLICK TO SAVE THE PDF VERSION OF THIS POST link.

Ideally, you can insert your “create a PDF” link anywhere in the post. It depends on your own template and preferences. However, for the purpose of this article and providing a standard method, we may wish to insert it at the bottom of the post.

Open single.php, and then find the code that looks similar to this:

<?php the_content( __(‘<p>Read the rest of this entry &raquo;</p>

Below the code above, insert this code:

<?php

$titletag=trim(wp_title(”,false));

echo ‘<a rel=”nofollow” href=”http://www.php-developer.org/postpdfcreator.php?title=’.$titletag.’”><b>CLICK TO SAVE THE PDF VERSION OF THIS POST</b></a>’;

?>

Replace the red font with your own domain name.

IMPORTANT: Do not forget to place a rel=nofollow to prevent search engines from crawling the content; otherwise, you could end up with duplicate content issues.

Now that you have created the PDF link, it is time to create your PHP PDF Creator script. The governing flowchart for the programming is this:

First, you need to connect to your WordPress MySQL database:

<?php

$username = “xxx”;

$password = “xxx”;

$hostname = “xxx”;

$database = “xxx”;

$dbhandle = mysql_connect($hostname, $username, $password)

 or die(“Unable to connect to MySQL”);

//select a database to work with

$selected = mysql_select_db($database,$dbhandle)

or die(“Could not select $database”);

For the second step, you need to receive the GET request when users click on the “Click to save the PDF version of this post”:

$posttitle=trim($_GET['title']);

It is important to sanitize user inputs to avoid any possibility of MySQL injection:

$posttitle = mysql_real_escape_string(stripslashes($posttitle));

The third step is to query MySQL to extract the “published” post corresponding to the requested title.

$result = mysql_query(“SELECT `post_content` FROM `wp_posts` WHERE `post_title`=’$posttitle’ AND `post_status`=’publish’”)

or die(mysql_error());

$row = mysql_fetch_array($result)

or die(“Invalid query: ” . mysql_error());

$content = $row['post_content'];

The fourth step is to assign the post content to a PHP variable as well as to strip HTML tags.

$content= strip_tags($content);

$message = “Post title: <b>$posttitle</b>

Author: www.php-developer.org 

$content

Copyright © 2009 www.php-developer.org“;

Edit items in a red font and replace with your own domain.

The fifth step is to integrate the R & OS PDF Creator class

include (‘class.ezpdf.php’);

If you upload this to the root directory of a hosting server, the above path will change. It depends on your hosting configuration; some may use a path like this:

include (‘/home/content/phpdeveloper/html/class.ezpdf.php’);  

If you are confused in determining the exact path of your root directory, upload this test script (save as rootpath.php) to your root directory and execute it in the browser to see the results:

<?php

echo $_SERVER['SCRIPT_FILENAME'];

?>

For example, when uploaded this script to the root directory, of the website http://www.php-developer.org , you get the following result:

/home/www/php-developer.org/rootpath.php  

So when the class.ezpdf.php is uploaded to the root directory, the path will be:

/home/www/php-developer.org/class.ezpdf.php  

So the correct PHP include syntax to include this PDF creator class will be:

include (‘/home/www/php-developer.org/class.ezpdf.php’);

The sixth step is to customize fonts:

$pdf->selectFont(‘/home/www/php-developer.org/fonts/Courier.afm’);

$pdf->ezText($message,11);

Using an image is optional.

Finally, to stream the PDF to the browser, execute this function and close the MySQL:

$pdf->ezStream();

mysql_close($dbhandle);

?>

Now that you understand the concept and scripts behind the creation of PDF for your WordPress post, it is time to implement it in a real website:

Step 1: Upload the following file to your website/WordPress root directory:

a  Font (folder)

b. class.ezpdf.php

c. class.pdf

You can download the raw source files here: http://cid-c3bc6a3c5463e218.skydrive.live.com/self.aspx/.Public/pdfclassrOS.zip, or at the author’s website: http://www.ros.co.nz/pdf/

Step 2. Customize your postpdfcreator.php (you can download the complete script here: http://www.php-developer.org/wp-content/uploads/scripts/postpdfcreator.txt).

Put in the correct database information, path of the files, customize the PDF output, etc. And then upload postpdfcreator.php to the root directory of your WordPress website.

Step 3. Put a hyperlink in your WordPress template:

Insert this code below <?php the_content( __(‘<p>Read the rest of this entry &raquo;</p>

<?php

$titletag=trim(wp_title(”,false));

echo ‘<a rel=”nofollow” href=”http://www.php-developer.org/postpdfcreator.php?title=’.$titletag.’”><b>CLICK TO SAVE THE PDF VERSION OF THIS POST</b></a>’;

?>

Step 4. Block postpdfcreator.php in the robots.txt to block duplicate content. Add this in your robots.txt syntax:

Disallow: /postpdfcreator.php

Step 5. Test your WordPress and see if the output corresponds to your expectations; you can tweak it further.

You can see a sample implementation in this website: http://www.php-developer.org. Look at any post and you will see the link on the bottom. This PDF integration is very basic and only prints the text content of the post; it does not include images. 

DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

More Onsite SEO from Google SEO Report Card

2010
04.08

This is the second part of a two-part series on what we can do to improve our onsite SEO, based on Google’s recent SEO Report Card of some of its own sites. In this part, you’ll learn the remaining three onsite SEO factors that can help your site achieve higher rankings.

In the first part, you learned about the following important factors included in the Google SEO Report Card:

1. Title tag format and length

2. Showing related snippets in search results

3. Effective use of sitelinks

4. Duplicate content check: clear main page result

In this second part, we will cover the remaining important onsite SEO factors included in the Google SEO Report Card:

5. Importance of URL canonicalization

6. Effective use of the Header tag

7. Use of logo image alt text

All in all, there are seven important onsite items checked by Google in its SEO Report Card.

Importance of URL Canonicalization

Google strongly emphasizes the importance of URL canonicalization in their SEO Report Card. There is a lot of important information pertaining to URL canonicalization to keep in mind.

For example, if the same content will be accessible to different URLs, their “Reputation” is being split, which affects the ranking of the canonical URL.

A classic example pf this occurs when you have a website home page URL, for example http;//www.yourdomain.com. Say you have engaged in some link building efforts which help this URL earn some reputation. Time passes by, and eventually there are a total of 45 organic links pointing to that URL alone (remember, this is just an example).

And then, there comes a time when you decided to make some improvements on your home page URL, like updating its design to make it more trendy and attractive. After the design has been completed, you have to upload a new home page file; for example home.html, so your home page URL will now become:

http://www.yourdomain.com/home.html

You also update all of your website’s internal links to point to http://www.yourdomain.com/home.html instead of the previous home page URL. And then, since your home page URL is new, you decide to promote it by getting some links; thus, the new home page URL earns another 20 new organic links from your effort.

And now, here comes the main problem: you notice that your rankings drop, even though your website is earning some new organic links.

From an SEO point of view, your site is facing serious canonical issues because you have two home page URLs that are crawlable and indexable by Google: http://www.youromain.com/home.html and http://www.yourdomain.com/, and both are earning links.

See the screen shot below for the canonical issues:

What if you later decide that http://www.yourdomain.com/ is your canonical URL? Of course, there will be issues because http://www.yourdomain.com/home.html has the same content as http://www.yourdomain.com/. You will face some challenges with pushing the rankings up for your canonical URL, since it lacks the organic links caused by the split in reputation (with /home.html)

Google recommended in their SEO Report Card that problems like this can be fixed by implementing or copying the design change from /home.html to /, then once that is completed, you should do a “301 redirect” from http://www.yourdomain.com/home.html to http://www.yourdomain.com/.

If you are in an Apache server with .htaccess enabled, you can accomplish this task by using these lines:

redirect 301 /home.html  http://www.yourdomain.com/

According to Google’s report, 301 redirecting other URLs to the canonical URLs will combine all split-up reputations into one (all links earned will be combined to a single URL), making the canonical URL stronger. This means it will rank better in Google. So the above screen shot will now become (after 301 redirection):

Those 20 links will now be added to the canonical URL, increasing the number of links, and canonicalization of URLs will translate to better rankings.

The second point you should keep in mind is that you need to be careful of other common canonical issues. Even Google experiences a problem with those, as stated in their Google SEO Report Card.

Here’s a common example: http://www.google.com/books (this is the canonical URL). However, the URL with the trailing slash also returns a 200 OK Header status: http://www.google.com/books/

Google’s recommendation to itself is to 301 redirect http://www.google.com/books/  to http://www.google.com/books, to avoid splitting reputation. This has the effect of combining two URLs into one, like the case mentioned previously.

Google does examine the effective use of header tags with respect to SEO. According to the Google SEO Report Card, “Using semantic markup like heading tags can provide search engines with useful information about how your document is structured that wouldn’t be possible with plain text.

So what can be learned from the SEO Report Card on the effective use of header tags?

First, structure your content by using H1 tags first, H2 and then H3. Wikipedia, for example, has a very good content structure using H1 and H2 tags. See screen shot below:

Second, you should have an H1 tag on the page. Google emphasizes the use of the H1 tag among other header tags. A good comment from Google regarding H1 tags found in the SEO Report Card:

While styling your text so it appears larger might achieve the same visual presentation, it does not provide the same semantic meaning to the search engine that an <h1> tag does. The product’s name and/or a few words about its features are great to have in an <h1> tag for the product main page.

Third, avoid using no header tags at all. It seems that header tags are important to Google, and they even suggest adding header tags to their own product pages as stated in the report.

The last important onsite SEO item Google checked in its SEO Report Card is the use of alt tags in the images. What follows are some important guidelines Google uses that pertain to alt tags in images.

First, you should avoid using empty alt tags for images.

Second, when using alt tags, you should be using a relevant, accurate and descriptive alt text.  For example, instead of using the alt text “Alaska Vacation,” you can it make it more descriptive and meaningful — for example: “Alaska Vacation – Whale watching photos”

Third, you should leverage the power of internal links by using alt text in image links. If you use a descriptive and accurate alt text, it will serve as anchor text and will help the user to determine what, exactly, the linked page is all about.

For example:

The exact source code of the encircled box above:

<a href=”http://www.php-developer.org/strategy-to-get-all-urls-in-the-blogger-or-any-website/”><img title=”Strategy to get all URLs in the Blogger or any website” alt=”Strategy to get all URLs in the Blogger or any website” src=”http://www.php-developer.org/wp-content/themes/arras-theme.1.3.5/library/timthumb.php?src=http://www.php-developer.org/wp-content/uploads/
2010/02/xenucrawler.jpg&amp;w=190&amp;h=100&amp;zc=1″></a> 

We’ll start by reviewing the objectives set in the first part. There were two of those:

1. List the important things that Google checks during their SEO onsite analysis.

2. Evaluate what Google’s SEO Report Card contains that SEO practitioners normally ignore.  

For the first objective, we learned that Google basically checks seven important onsite items in their SEO Report Card:  

1. Title tag format and length

2. Showing related snippets in search results

3. Effective use of sitelinks

4. Duplicate content check: clear main page result

5. Importance of URL canonicalization

6. Effective use of Header tag

7. Use of logo image alt text

As to the second objective, there are some things that might be missed by common SEO practitioners.

First, some SEOs completely dismiss meta descriptions. This is NOT bad (as shown in the analysis in the first part), however, there might be some important pages on your website that do not have ample content that Google can grab as nice and related snippets. For example, consider what Google does in their SEO Report Card (e.g http://www.google.com/mapmaker); they used meta descriptions only in some special parts of their website, to give a better user experience in the search result.

Second, some SEOs ignore and fail to improve the sitelinks of their clients, resulting in an effect on the click through from search engines that also decreases the effective traffic. Google recommends that you tweak sitelinks using Google Webmaster Tools.

Third, some SEOs dismiss the idea that duplicate content causes a penalty in rankings, and the normal reaction is to ignore these issues. However, it is surprisingly stated in the Google SEO Report Card that URL canonicalization is needed since “split” reputation or “link juices” can affect rankings.

Here is the exact comment from Google: “Prevent dilution of reputation – If the same content is accessible through multiple URLs, this could cause duplicate content. This content may rank worse because its reputation is spread over multiple URLs.

It might not be, strictly speaking, a “penalty,” but it affects the URL’s ranking.

Finally, some SEOs might not include header tags in their optimization, thinking they have a low impact on rankings. Even Google includes improving their header tags to give their content a better structure, however, which results in a better user experience as well as improved relevance in the search results.

DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

Microsoft and Google Attempting to Follow Twitter`s Lead

2010
04.08

The extreme success and popularity obtained by the social networking site known as Twitter has caused other companies to take notice.  Among those companies are technology giants Microsoft and Google, who have both seen Twitter become highly successful and plan to follow suit…but not, perhaps, in the way you might be thinking.

The way in which they will follow suit has a twist to it.  Rather than looking to social networking as a fun medium, Microsoft and Google are looking to create programs that incorporate social networking to help companies increase efficiency and productivity.

Microsoft is testing social networking in the workplace via its OfficeTalk application.  OfficeTalk is not available to the public, as it is more of a research project.  With OfficeTalk, employees can post ideas, updates, and the like in an effort to share information with others within the organization through microblogging, just as Twitter does, but in more of a social form. 

Microsoft is testing the effectiveness of such a product on itself first by making OfficeTalk available to its own employees.  The company is hoping to find out how employees will use the program and what features would be helpful.  After using themselves as a guinea pig, Microsoft will then shift the focus of the program to other organizations to get a better sample size and more ideas on how to improve OfficeTalk.  This is essential; many businesses differ, and what might work for a company like Microsoft might not work for another type of company.

Not to be outdone, Google is also testing a similar social networking in the workplace concept with its Google Buzz program.  Google Buzz is a program that works in conjunction with Gmail to give users a comprehensive view of their friends’ information.  For instance, they can view friends’ statuses, photos, videos, and links all in one place.  If this sounds familiar, then it is, as Facebook basically uses this model.  Google hopes that Buzz will be more useful for organizations and employees, however. 

Buzz also integrates several existing social networking sites into its platform, such as Flickr, YouTube, Picasa, and more. Unlike Microsoft’s OfficeTalk, Google Buzz was actually made public on February 9, 2010.  While the idea sounds good, the program has already received criticism for its perceived lack of privacy in sharing too much personal information with other users.

One has to wonder, why doesn’t Twitter come up with a workplace-based program? They could be developing one, and at the moment they do offer private accounts to keep information safe and enclosed.  Whether or not a full-fledged corporate program gets released in the future is something to be determined. 

Besides the already-diagnosed problems with privacy as seen with the Google Buzz program, other questions and concerns could arise concerning the development and use of such a social networking program within a business.  Primarily, will it cause too much of a distraction?  With many people’s lives seemingly consumed by text messaging, Facebook, Twitter, and the like, will the implementation of such a system at work actually decrease efficiency? The possibility seems to be very high. If developers can provide a solution that does increase workplace efficiency without causing a distraction, however, it will be worth its weight in gold.