Recently I wanted to create a page that would list the backlinks to certain sites that I had chosen. Yahoo always seems to have the most comprehensive list of backlinks and luckily they also make available a series of APIs that enables you to access their data. Using the Site Explorer Inbound Links API, and an example of a script towards the bottom of that page, I was able to put together a script that closely mirrors the sort of results you would get if you entered a site’s URL into Yahoo’s Site Explorer.
The only problem with these results is that you often end up with lots of links from the same domain if, for example, you have a link to your site in a forum signature, or the link appears in a sidebar of a blog with lots of pages. I wanted to manipulate the results sent back from Yahoo to only display the first one or two links from any one domain regardless of how many links there were.
I started first with the example from Yahoo used to extract the information (you’ll need to get an API key from Yahoo to get this to work on your own site):
$api_service_url = "http://search.yahooapis.com/SiteExplorerService/V1/inlinkData";
$apiid = "your_api_key_goes_here:_get_it_from_yahoo";
$query = $input_url; // can be hard-coded or receive a value from a function
$entire_site = ""; // "1" to provide results for the entire site
$omit_inlinks = "domain";
$linksperrequest = 100; // 100 is max value
$startposition = 1;
$request_url = $api_service_url."?appid=".$apiid."&query=".urlencode($query)."&entire_site=".$entire_site."&omit_inlinks=".$omit_inlinks."&output=php";
$currentpos = 0;
while ($currentpos++ >= 0) {
$requrl = sprintf("%s&start=%s&results=%s", $request_url, ($currentpos-1)*$linksperrequest+$startposition, $linksperrequest);
if (($content = file_get_contents($requrl)) === FALSE ) {
echo "HTTP error: $requrl";
exit;
} else {
$data = unserialize($content);
if (array_key_exists("ResultSet", $data)) {
for ($i=0; $i<sizeof($data["ResultSet"]["Result"]); $i++) {
$url = $data["ResultSet"]["Result"][$i]["Url"]; // backlink URL
$title = $data["ResultSet"]["Result"][$i]["Title"]; // page title for the backlink
}
} else {
echo "Error: Bad response from server";
}
if (sizeof($data["ResultSet"]["Result"]) < $linksperrequest) break;
}
}
The API only lets you process 100 results per request, so the script is set up to cycle through 100 a results at a time until it reaches the end. As it does so, the URL and page title for the site linking back to you are captured with:
$url = $data["ResultSet"]["Result"][$i]["Url"]; // backlink URL $title = $data["ResultSet"]["Result"][$i]["Title"]; // page title for the backlink
Originally, I thought I might be able to use PHP’s parse_url function to just extract the domain portion of each link and then feed that into array_unique to remove all the duplicate occurrences of a domain, but that didn’t give me the ability to set a certain limit for how many links I wanted to permit from each domain and also completely removed the unique portion of the link (i.e., everything after the domain name).
So I modified the Yahoo script like so:
for ($i=0; $i<sizeof($data["ResultSet"]["Result"]); $i++) {
$url = $data["ResultSet"]["Result"][$i]["Url"]; // backlink URL
$title = $data["ResultSet"]["Result"][$i]["Title"]; // page title for the backlink
$domain = 'http://'.parse_url($url, PHP_URL_HOST);
$backlinks[$domain][] = array($url, $title);
}
and then with a bit of help from Tony Aslett and Chris..S at CSS Creator and included:
define('BACKLINK_LIMIT',2);
define('BACKLINK_TRUNCATE',1);
define('BACKLINK_ALL',0);
foreach ($backlinks as $domain => $links) {
if (count($links) > BACKLINK_LIMIT) $backlinks[$domain] = array_slice($links, 0, BACKLINK_TRUNCATE);
}
sort($backlinks[$domain]);
function print_backlinks($domain, $links, $num) {
$limit = $num ? min($num,count($links)) : count($links);
for ($i=0; $i < $limit; $i++) {
list($url,$title) = $links[$i];
echo '<li><a href="'.$url.'">'.$title.'</a></li>';
}
}
echo '<ul>';
foreach ($backlinks as $domain => $links) {
if (count($links) > BACKLINK_LIMIT) {
print_backlinks($domain, $links, BACKLINK_TRUNCATE);
} else {
print_backlinks($domain, $links, BACKLINK_ALL);
}
}echo '</ul>';
This defines some constants: BACKLINK_LIMIT – the number past which I want to manipulate entries; BACKLINK_TRUNCATE – the number of links to display from the domains that have more than the number of links specified in BACKLINK_LIMIT; BACKLINK_ALL – self-explanatory.
Then a foreach loop filters all the link results from the Yahoo API, and if there are more links for each domain than the limit specified, array_slice removes all those links after the point set with BACKLINK_TRUNCATE.
After sorting the now filtered array, it is run through another foreach loop to print the results to the screen using a function called print_backlinks except this time if there are more links than the chosen limit, BACKLINK_TRUNCATE is passed to the function which then imposes that value as the upper limit of passes for the loop (rather than looping through all the results).
The end result should now look like:
function backLink($input_url) {
$api_service_url = "http://search.yahooapis.com/SiteExplorerService/V1/inlinkData";
$apiid = "your_api_key_goes_here";
$query = $input_url;
$entire_site = "";
$omit_inlinks = "domain";
$linksperrequest = 100;
$startposition = 1;
$request_url = $api_service_url."?appid=".$apiid."&query=".urlencode($query)."&entire_site=".$entire_site."&omit_inlinks=".$omit_inlinks."&output=php";
$currentpos = 0;
while ($currentpos++ >= 0) {
$requrl = sprintf("%s&start=%s&results=%s", $request_url, ($currentpos-1)*$linksperrequest+$startposition, $linksperrequest);
if (($content = file_get_contents($requrl)) === FALSE ) {
echo "HTTP error: $requrl";
exit;
} else {
$data = unserialize($content);
if (array_key_exists("ResultSet", $data)) {
for ($i=0; $i<sizeof($data["ResultSet"]["Result"]); $i++) {
$url = $data["ResultSet"]["Result"][$i]["Url"];
$title = $data["ResultSet"]["Result"][$i]["Title"];
$domain = 'http://'.parse_url($url, PHP_URL_HOST);
$backlinks[$domain][] = array($url, $title);
}
} else {
echo "Error: Bad response from server";
}
if (sizeof($data["ResultSet"]["Result"]) < $linksperrequest) break;
}
}
define('BACKLINK_LIMIT',2);
define('BACKLINK_TRUNCATE',1);
define('BACKLINK_ALL',0);
foreach ($backlinks as $domain => $links) {
if (count($links) > BACKLINK_LIMIT) $backlinks[$domain] = array_slice($links, 0, BACKLINK_TRUNCATE);
}
sort($backlinks[$domain]);
function print_backlinks($domain, $links, $num) {
$limit = $num ? min($num,count($links)) : count($links);
for ($i=0; $i < $limit; $i++) {
list($url,$title) = $links[$i];
echo '<li><a href="'.$url.'">'.$title.'</a></li>';
}
}
echo '<ul>';
foreach ($backlinks as $domain => $links) {
if (count($links) > BACKLINK_LIMIT) {
print_backlinks($domain, $links, BACKLINK_TRUNCATE);
} else {
print_backlinks($domain, $links, BACKLINK_ALL);
}
}
}
So although this script closely resembles the sort of results you would get from Yahoo’s Site Explorer, for me it has three advantages:
- Using an array or by linking it to a database select query, you can display the backlink results for any number of sites, not just one, in the same location, or even on the same page.
- It enables me to not only customise the output of the results but I can also display it in a template of my own choosing which means it can be integrated into client-only sections of websites.
- It enables me to filter out lots of repetitive backlinks to make the overall display more readable, and therefore more usable.
Tyssen Design
Randy | July 4th, 2008 at 7:16 am
I’m not a coder so bear with me but I stumbled on this script when searching for a way to use the yahoo api to download more than 1000 links. It looks like it will work, I’m just having a tough time understanding where I put in my site name that I’d like to query. I see the yahoo api key portion but, again, not being able to php my way out of a wet paper bag, I’m stumped as to where to insert my site name. Thanks for the help!
John Faulds | July 4th, 2008 at 9:39 am
Hi Randy, the function requires a URL to be input –
function backLink($input_url)– which you can call from anywhere in your page withbacklink('http://www.example.com')or in my case I used a value from a database query as it looped through the records.The Frosty | July 25th, 2008 at 5:15 pm
Sweet
J0ny | July 31st, 2008 at 5:25 am
Very cool man, I just stumbled on this, and it makes me want to jump back into PHP again…
Yahoo exp can get annyoying how it always lists its own domain…
Oscar | August 20th, 2008 at 1:50 am
It seems that an error happens for sites with more than 1,000 back links. Every time I try with such a site I get this error message:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
And it always happens when start=1001. For sites with fewer links, it always works though. Am I doing something wrong, or is there a problem handling more than 1,000 back links?
John Faulds | August 20th, 2008 at 8:00 am
@Oscar, the limit for links per request is 100 (set by Yahoo) so you can’t make $linksperrequest more than that. But the script is set up so that it will keep being submitted until no more links are found and the results are appended to each other.
Oscar | August 20th, 2008 at 10:32 pm
But Yahoo’s official API documentation says that:
“The starting result position to return (1-based). The finishing position (start + results – 1) cannot exceed 1000.”
The script increments the start value in a loop, and every time it reaches at total of over 1,000 (let’s say start=1001&results=100) it crashes for me, because Yahoo is returning a 400 Bad Request error.
I appreciate your help, but I just can’t see how the script would circumvent Yahoo’s 1,000 limit. Have you tested it with a site that has more than 1,000 links?
John Faulds | August 20th, 2008 at 11:26 pm
@Oscar: you’re quite right – I haven’t tested it on a site with more than 1000 links and wasn’t aware of that part of the API docs. Sorry for the confusion.
Uploadmega | November 5th, 2008 at 9:30 am
very usefull article. it is more easy to get back links from Yahoo than Google.
Link Building Blog | January 15th, 2009 at 3:49 pm
How do sites like linkdiagnosis.com return thousands of backlink results? They say the Yahoo API is used, but I’m kinda stumped.
John Faulds | January 15th, 2009 at 4:10 pm
You’d have to ask them. Looks like an interesting tool though.
Enny Zeny | January 24th, 2009 at 10:47 pm
Why my domain not yet pagerank?
John Faulds | January 25th, 2009 at 12:46 pm
Enny, this article’s about Yahoo’s Site Explorer API; pagerank is a Google feature – you’d have to ask them.
Maciej | January 26th, 2009 at 6:07 am
Hi,
I’ve come across this tool today… Really nice one!
However when I downloaded it and installed at the server (with proper AppId), I keep getting error saying: “failed to open stream: HTTP request failed! HTTP/1.1 999 Rate Limit Exceeded in /inlinks/backlink.php on line 15″
I know this is connected with query limit but even if I change IP (I have dynamic IP connection) nothing happens… Tried this several times and doesn’t work :/ Any ideas what’s wrong?
Armand | April 2nd, 2009 at 3:33 am
Well, I guess I’ll parse the XML data using SimpleXML Class on PHP 5, it’s simple and easier. Anybody knows the similar API for Google instead?
Bob | May 20th, 2009 at 3:55 pm
Many thanks.
This is a great code that i’m looking for and thanks yahoo for the API.
BackLinkStat | June 9th, 2009 at 8:23 pm
Try BackLinkStat.com – Get detailed backlinks report of your site for FREE!
Thomas | June 19th, 2009 at 6:00 am
@oscar
try the code to stop crashes:
while ( ($currentpos++ >= 0) && ( (($currentpos-1)*$linksperrequest+$startposition) < 999) ) {
works fine for me.
Martin | August 29th, 2009 at 2:24 am
Many thanks. You saved me time! Good job guys!!
crohole | August 29th, 2009 at 5:27 pm
I have try it on searching with my website url. But That’s make an error which can’t more than 1000 links to detect. Can you fix it, so the crawl can detect more than 1000 link’s or more.. Please tell me. Thank’s for good article.
neel | December 11th, 2009 at 6:13 pm
Excellent script thanks for sharing it. I will try this on my website.
Henrik | January 5th, 2010 at 8:42 am
I wrote an asp.net version.
Thanks for the inspiration.
John Faulds | January 5th, 2010 at 8:53 am
Nice work Henrik.
Henrik | January 19th, 2010 at 1:25 am
Thanks John. Great artice.
It’s a nice tool to have.
chris | March 1st, 2010 at 12:14 am
site explorer lets you combine other terms to get better result. eg -site: lets you exclude a domains internal links, in i think by doing searches with a few terms like “-inurl:a” then combining and removing duplicates you can get around the 1000 backlink limit
sheetal | April 10th, 2010 at 2:14 am
Nice Article.
I need a help in this little bit more.
I’ll type in a URL of the site I want the result of, then
1) first it will go to yahoo site expolorer to see all of it’s backlinks there and store them then it will go to each site in that list and do the same thing
actually the first bit where I put the url in it should also have an input for “depth to crawl”
and that should tell the script how many times it should do this
so if depth is 1
it would store all the sites found for the URL and 1 level below
so the same for each of the sites in that list.
Any help for this will really appreciated.
Thanks
instinctis | May 30th, 2010 at 1:48 pm
Great points you’ve posted here, thanks. Is completely true that yahoo is a great engine for checking your backlinks / InLinks (and a little more) as it seems more accurate than others when it comes to Inlinks…anyway I used to do the queries manualy untill i found this free tool: http://ministatus.com which also alows one to download the results in pdf file and also generates the free badge price for each and every queried websites.
Al Manchester | May 31st, 2010 at 7:11 am
I think that since you wrote the code Yahoo has changed the API, as you can now ask it to filter out any links from that chosen domain. I found your code really useful though.
Does anyone have any experience with the daily limits that Yahoo imposes? Is there anything that can be done to increase them?
John Faulds | May 31st, 2010 at 8:00 am
Thanks for the update Al. I actually haven’t done much with this code or the Yahoo API since writing the article.
Hampers | November 13th, 2010 at 10:25 am
Have used the backlink checker suggested in entry 27 above and have to recommend it.
Matt Brading | September 27th, 2011 at 11:15 am
Just wondering if anyone can post a link to this script in action?
I’ve tried calling it from a test page using the function call … backlink(‘http://www.example.com’) … but I only get a blank page, so guessing I’m doing it wrong?
Any suggestions or examples?
Thanks!
Matt
John Faulds | September 27th, 2011 at 11:21 am
Matt, if you follow the link to the Site Explorer Inbound Links API from the article you’ll see there’s a notice at the top saying the service was shut down on September 15 which is probably why it’s not working.
Matt Brading | September 27th, 2011 at 11:30 am
OK, thanks for the quick response John.
Kang | December 14th, 2011 at 5:11 pm
Any other idea for back links please
John Faulds | December 14th, 2011 at 5:16 pm
There’s a few listed here.