Checking backlinks with Yahoo’s Site Explorer Inbound Links API

Recently I wanted to create a page that would list the backlinks to certain sites that I had chosen. Yahoo always seems to have the most comprehensive list of backlinks and luckily they also make available a series of APIs that enables you to access their data. Using the Site Explorer Inbound Links API, and an example of a script towards the bottom of that page, I was able to put together a script that closely mirrors the sort of results you would get if you entered a site’s URL into Yahoo’s Site Explorer.

The only problem with these results is that you often end up with lots of links from the same domain if, for example, you have a link to your site in a forum signature, or the link appears in a sidebar of a blog with lots of pages. I wanted to manipulate the results sent back from Yahoo to only display the first one or two links from any one domain regardless of how many links there were.

I started first with the example from Yahoo used to extract the information (you’ll need to get an API key from Yahoo to get this to work on your own site):

$api_service_url = "http://search.yahooapis.com/SiteExplorerService/V1/inlinkData";
$apiid = "your_api_key_goes_here:_get_it_from_yahoo";
$query = $input_url; // can be hard-coded or receive a value from a function
$entire_site  = "";  // "1" to provide results for the entire site
$omit_inlinks = "domain";
$linksperrequest = 100;   // 100 is max value
$startposition = 1;

$request_url = $api_service_url."?appid=".$apiid."&query=".urlencode($query)."&entire_site=".$entire_site."&omit_inlinks=".$omit_inlinks."&output=php";

$currentpos = 0;
while ($currentpos++ >= 0) {
    $requrl = sprintf("%s&start=%s&results=%s", $request_url, ($currentpos-1)*$linksperrequest+$startposition, $linksperrequest);
    if (($content = file_get_contents($requrl)) === FALSE ) {
        echo "HTTP error: $requrl";
        exit;
    } else {

        $data = unserialize($content);
        if (array_key_exists("ResultSet", $data)) {
            for ($i=0; $i<sizeof($data["ResultSet"]["Result"]); $i++) {
                $url = $data["ResultSet"]["Result"][$i]["Url"]; // backlink URL
                $title = $data["ResultSet"]["Result"][$i]["Title"]; // page title for the backlink

            }
        } else {
            echo "Error: Bad response from server";
        }

         if (sizeof($data["ResultSet"]["Result"]) < $linksperrequest) break;
    }
}

The API only lets you process 100 results per request, so the script is set up to cycle through 100 a results at a time until it reaches the end. As it does so, the URL and page title for the site linking back to you are captured with:

$url = $data["ResultSet"]["Result"][$i]["Url"]; // backlink URL
$title = $data["ResultSet"]["Result"][$i]["Title"]; // page title for the backlink

Originally, I thought I might be able to use PHP’s parse_url function to just extract the domain portion of each link and then feed that into array_unique to remove all the duplicate occurrences of a domain, but that didn’t give me the ability to set a certain limit for how many links I wanted to permit from each domain and also completely removed the unique portion of the link (i.e., everything after the domain name).

So I modified the Yahoo script like so:

for ($i=0; $i<sizeof($data["ResultSet"]["Result"]); $i++) {
  $url = $data["ResultSet"]["Result"][$i]["Url"]; // backlink URL
  $title = $data["ResultSet"]["Result"][$i]["Title"]; // page title for the backlink
  $domain = 'http://'.parse_url($url, PHP_URL_HOST);
  $backlinks[$domain][] = array($url, $title);
}

and then with a bit of help from Tony Aslett and Chris..S at CSS Creator and included:

define('BACKLINK_LIMIT',2);
define('BACKLINK_TRUNCATE',1);
define('BACKLINK_ALL',0);    

foreach ($backlinks as $domain => $links) {
    if (count($links) > BACKLINK_LIMIT) $backlinks[$domain] = array_slice($links, 0, BACKLINK_TRUNCATE);
}

sort($backlinks[$domain]);

function print_backlinks($domain, $links, $num) {
    $limit = $num ? min($num,count($links)) : count($links);
    for ($i=0; $i < $limit; $i++) {
        list($url,$title) = $links[$i];
        echo '<li><a href="'.$url.'">'.$title.'</a></li>';
    }
}    

echo '<ul>';
foreach ($backlinks as $domain => $links) {
    if (count($links) > BACKLINK_LIMIT) {
        print_backlinks($domain, $links, BACKLINK_TRUNCATE);
    } else {
        print_backlinks($domain, $links, BACKLINK_ALL);
    }
}echo '</ul>';

This defines some constants: BACKLINK_LIMIT – the number past which I want to manipulate entries; BACKLINK_TRUNCATE – the number of links to display from the domains that have more than the number of links specified in BACKLINK_LIMIT; BACKLINK_ALL – self-explanatory.

Then a foreach loop filters all the link results from the Yahoo API, and if there are more links for each domain than the limit specified, array_slice removes all those links after the point set with BACKLINK_TRUNCATE.

After sorting the now filtered array, it is run through another foreach loop to print the results to the screen using a function called print_backlinks except this time if there are more links than the chosen limit, BACKLINK_TRUNCATE is passed to the function which then imposes that value as the upper limit of passes for the loop (rather than looping through all the results).

The end result should now look like:

function backLink($input_url) {
  $api_service_url = "http://search.yahooapis.com/SiteExplorerService/V1/inlinkData";
  $apiid = "your_api_key_goes_here";
  $query = $input_url;
  $entire_site  = "";
  $omit_inlinks = "domain";
  $linksperrequest = 100;
  $startposition = 1;
  $request_url = $api_service_url."?appid=".$apiid."&query=".urlencode($query)."&entire_site=".$entire_site."&omit_inlinks=".$omit_inlinks."&output=php";
  $currentpos = 0;
  while ($currentpos++ >= 0) {
      $requrl = sprintf("%s&start=%s&results=%s", $request_url, ($currentpos-1)*$linksperrequest+$startposition, $linksperrequest);
      if (($content = file_get_contents($requrl)) === FALSE ) {
          echo "HTTP error: $requrl";
          exit;
      } else {
          $data = unserialize($content);
          if (array_key_exists("ResultSet", $data)) {
              for ($i=0; $i<sizeof($data["ResultSet"]["Result"]); $i++) {
                $url = $data["ResultSet"]["Result"][$i]["Url"];
                $title = $data["ResultSet"]["Result"][$i]["Title"];
                $domain = 'http://'.parse_url($url, PHP_URL_HOST);
                $backlinks[$domain][] = array($url, $title);
              }
          } else {
              echo "Error: Bad response from server";
          }
           if (sizeof($data["ResultSet"]["Result"]) < $linksperrequest) break;
      }
  }
  define('BACKLINK_LIMIT',2);
  define('BACKLINK_TRUNCATE',1);
  define('BACKLINK_ALL',0);    

  foreach ($backlinks as $domain => $links) {
      if (count($links) > BACKLINK_LIMIT) $backlinks[$domain] = array_slice($links, 0, BACKLINK_TRUNCATE);
  }

  sort($backlinks[$domain]);
  function print_backlinks($domain, $links, $num) {
      $limit = $num ? min($num,count($links)) : count($links);
      for ($i=0; $i < $limit; $i++) {
          list($url,$title) = $links[$i];
          echo '<li><a href="'.$url.'">'.$title.'</a></li>';
      }
  }
  echo '<ul>';

  foreach ($backlinks as $domain => $links) {
      if (count($links) > BACKLINK_LIMIT) {
          print_backlinks($domain, $links, BACKLINK_TRUNCATE);
      } else {
          print_backlinks($domain, $links, BACKLINK_ALL);
      }
  }
}

So although this script closely resembles the sort of results you would get from Yahoo’s Site Explorer, for me it has three advantages:

  1. Using an array or by linking it to a database select query, you can display the backlink results for any number of sites, not just one, in the same location, or even on the same page.
  2. It enables me to not only customise the output of the results but I can also display it in a template of my own choosing which means it can be integrated into client-only sections of websites.
  3. It enables me to filter out lots of repetitive backlinks to make the overall display more readable, and therefore more usable.

Browse by tags:

Tags: backlinks, PHP, SEO, Yahoo

Subscribe to this site for regular updates

36 responses to Checking backlinks with Yahoo’s Site Explorer Inbound Links API. Add your own.

Comments

  1. 1

    I’m not a coder so bear with me but I stumbled on this script when searching for a way to use the yahoo api to download more than 1000 links. It looks like it will work, I’m just having a tough time understanding where I put in my site name that I’d like to query. I see the yahoo api key portion but, again, not being able to php my way out of a wet paper bag, I’m stumped as to where to insert my site name. Thanks for the help!

  2. 2

    Hi Randy, the function requires a URL to be input – function backLink($input_url) – which you can call from anywhere in your page with backlink('http://www.example.com') or in my case I used a value from a database query as it looped through the records.

  3. 3

    Sweet

  4. 4

    Very cool man, I just stumbled on this, and it makes me want to jump back into PHP again…
    Yahoo exp can get annyoying how it always lists its own domain…

  5. 5

    It seems that an error happens for sites with more than 1,000 back links. Every time I try with such a site I get this error message:

    failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request

    And it always happens when start=1001. For sites with fewer links, it always works though. Am I doing something wrong, or is there a problem handling more than 1,000 back links?

  6. 6

    @Oscar, the limit for links per request is 100 (set by Yahoo) so you can’t make $linksperrequest more than that. But the script is set up so that it will keep being submitted until no more links are found and the results are appended to each other.

  7. 7

    But Yahoo’s official API documentation says that:

    “The starting result position to return (1-based). The finishing position (start + results – 1) cannot exceed 1000.”

    The script increments the start value in a loop, and every time it reaches at total of over 1,000 (let’s say start=1001&results=100) it crashes for me, because Yahoo is returning a 400 Bad Request error.

    I appreciate your help, but I just can’t see how the script would circumvent Yahoo’s 1,000 limit. Have you tested it with a site that has more than 1,000 links?

  8. 8

    @Oscar: you’re quite right – I haven’t tested it on a site with more than 1000 links and wasn’t aware of that part of the API docs. Sorry for the confusion.

  9. 9

    very usefull article. it is more easy to get back links from Yahoo than Google.

  10. 10

    How do sites like linkdiagnosis.com return thousands of backlink results? They say the Yahoo API is used, but I’m kinda stumped.

  11. 11

    You’d have to ask them. Looks like an interesting tool though.

  12. 12

    Why my domain not yet pagerank?

  13. 13

    Enny, this article’s about Yahoo’s Site Explorer API; pagerank is a Google feature – you’d have to ask them.

  14. 14

    Hi,

    I’ve come across this tool today… Really nice one!

    However when I downloaded it and installed at the server (with proper AppId), I keep getting error saying: “failed to open stream: HTTP request failed! HTTP/1.1 999 Rate Limit Exceeded in /inlinks/backlink.php on line 15″

    I know this is connected with query limit but even if I change IP (I have dynamic IP connection) nothing happens… Tried this several times and doesn’t work :/ Any ideas what’s wrong?

  15. 15

    Well, I guess I’ll parse the XML data using SimpleXML Class on PHP 5, it’s simple and easier. Anybody knows the similar API for Google instead?

  16. 16

    Many thanks.
    This is a great code that i’m looking for and thanks yahoo for the API.

  17. 17

    Try BackLinkStat.com – Get detailed backlinks report of your site for FREE!

  18. 18

    @oscar
    try the code to stop crashes:

    while ( ($currentpos++ >= 0) && ( (($currentpos-1)*$linksperrequest+$startposition) < 999) ) {

    works fine for me.

  19. 19

    Many thanks. You saved me time! Good job guys!!

  20. 20

    I have try it on searching with my website url. But That’s make an error which can’t more than 1000 links to detect. Can you fix it, so the crawl can detect more than 1000 link’s or more.. Please tell me. Thank’s for good article.

  21. 21

    Excellent script thanks for sharing it. I will try this on my website.

  22. 22

    I wrote an asp.net version.
    Thanks for the inspiration.

  23. 23

    Nice work Henrik. :)

  24. 24

    Thanks John. Great artice.
    It’s a nice tool to have.

  25. 25

    site explorer lets you combine other terms to get better result. eg -site: lets you exclude a domains internal links, in i think by doing searches with a few terms like “-inurl:a” then combining and removing duplicates you can get around the 1000 backlink limit

  26. 26

    Nice Article.
    I need a help in this little bit more.

    I’ll type in a URL of the site I want the result of, then
    1) first it will go to yahoo site expolorer to see all of it’s backlinks there and store them then it will go to each site in that list and do the same thing
    actually the first bit where I put the url in it should also have an input for “depth to crawl”
    and that should tell the script how many times it should do this
    so if depth is 1
    it would store all the sites found for the URL and 1 level below
    so the same for each of the sites in that list.

    Any help for this will really appreciated.

    Thanks

  27. 27

    Great points you’ve posted here, thanks. Is completely true that yahoo is a great engine for checking your backlinks / InLinks (and a little more) as it seems more accurate than others when it comes to Inlinks…anyway I used to do the queries manualy untill i found this free tool: http://ministatus.com which also alows one to download the results in pdf file and also generates the free badge price for each and every queried websites.

  28. 28

    I think that since you wrote the code Yahoo has changed the API, as you can now ask it to filter out any links from that chosen domain. I found your code really useful though.

    Does anyone have any experience with the daily limits that Yahoo imposes? Is there anything that can be done to increase them?

  29. 29

    Thanks for the update Al. I actually haven’t done much with this code or the Yahoo API since writing the article.

  30. 30

    Have used the backlink checker suggested in entry 27 above and have to recommend it.

  31. 31

    Just wondering if anyone can post a link to this script in action?

    I’ve tried calling it from a test page using the function call … backlink(‘http://www.example.com’) … but I only get a blank page, so guessing I’m doing it wrong?

    Any suggestions or examples?

    Thanks!

    Matt

  32. 32

    Matt, if you follow the link to the Site Explorer Inbound Links API from the article you’ll see there’s a notice at the top saying the service was shut down on September 15 which is probably why it’s not working.

  33. 33

    OK, thanks for the quick response John.

  34. 34

    Any other idea for back links please

  35. 35

    There’s a few listed here.

Pingbacks

  1. 1

    [...]Original post:Checking backlinks with Yahoo�s Site Explorer Inbound Links API[...]

Feed for this post's comments


Leave a Reply

Contact details

Spam Protection by WP-SpamFree

Mobify empowers marketers and developers to create amazing mobile web experiences. Tap to learn more

Mobify