BookAdder Docfile:
Tuning Your Search,
and other exciting stuff

BookAdder: Search Tuning

But First!

Please be sure you have already installed and done a trial first run with BookAdder, as set forth in the preceding BookAdder docfile Installing BookAdder. Please: RTFM!


Some SEO Numerology

It's worth our while to take a moment to review what we're doing and why and how.

If we choose our search phrase so that at each division it will give us at least 4,000 titles, we will in actuality end up with an average of 1,550 titles per division from that 4,000. That's 9,300 titles--site pages--in all. But wait! There's more! (As they say on TV.) Each title, at each division, actually produces five site pages for you: one for the new-book-at-Amazon page, one for the used-copies-at-Abebooks page for that ISBN, one for the change-Abe-parameters page for that ISBN, one for the Abebooks search for that title/author combination, and one for the Abebooks-parameters page for that title/author combination. So the 9,300 titles become 46,500 distinct new pages on your site.

(In strictest exactness, you may not get precisely 5.00 pages per title, in that Amazon has a few specialty items Abebooks doesn't have, but in practical reality the value will range from about 4.90 to 4.99 pages per title, plus you get another 32 fixed pages per division no matter the titles count.)

In times gone by, there could have been some question as to whether all those pages were "useful" in an SEO sense, in that searchbots were widley rumored to read no more than the first 101 kilobytes of any web-page file. Whether that is or ever was true is, however, no longer relevant, owing to the wonderful advent of sitemaps. This is not a tutorial on sitemaps--there already is a satisfactory one at sitemaps.org. It is merely the observation that we no longer need worry about whether we have "saturated" a given page of links by either sheer byte length or by number of links. We can now tell the search engines directly about all our pages. We still need to have those links out there on our own pages, but the sitemaps now carry the real load.

If you're wondering if these pages really do end up indexed, all I can say is that my own all do. I should emphasize again: these are not "junk" pages or scrapings or search results or any of the other garbage the search engines so rightly complain about. These are honest, functional, relevant, dynamic, content-rich pages that are a true augmentation of your site in the fullest sense, not just by the numbers.


Constructing Phrases

The mechanics of the phrase are important. The search is an Amazon "Power Search" of books, and the field used is "keywords". The syntax is Boolean, which you need to know. If you need a full tutorial on how Boolean phrase construction works, find one with a search engine. Here is the bare outline (which really ought to be enough).


Boolean Combinations

Boolean phrase-construction uses the logical operators and, or, and not to mix and match various ordinary words. You can also use parentheses to group "sub-phrases" into a larger whole. It's probably easier to show than to try to explain at length.

  • pets would just search for books relating to "pets"
  • pets not cats would search for books relating to "pets" but would exclude books relating to cats
  • cats or dogs would search for books relating either to cats or to dogs
  • cats and dogs would search for books relating both to cats and to dogs--books on cats only or dogs only would be left off
  • pets not (cats or dogs) would search for books relating to pets, but would exclude books about either cats or dogs--it would return books on pets that are neither cats nor dogs, but none involving cats or dogs or both

When you need to combine words into a fixed phrase, you can group them inside double-quote marks. A search for horror movies might return books on horror and books on movies besides books specifically on horror movies; but if you search for "horror movies", the search engine understands that you mean the words as a phrase. (You could also search for movies and horror, but that--in principle, if the engine is as smart as one hopes--would give slightly different results, in that the phrase inside the quote marks would be used direct, so "horror movies"--or "horror movie"--would need to appear as such in the description, whereas the and'ed search only needs for each of the words to occur, in any order and at any separation.)


Principles of Building

The general way to proceed is to start with the narrowest phrase reasonably likely to produce some non-trivial number of titles. The narrower the scope, the more strongly relevant the list will be to your site's theme. For example, if you have a site about the history museums of Ritzville, Washington, you don't even have to try to realize that you're just not going to get anywhere with Ritzville and history as your phrase (and sure enough, it produces zero results everywhere). But you can start by generalizing just a bit: try "washington state" and history; now we have at least something--535 titles in Amazon U.S. But that's still way too low. How about history and (washington and not (george or dc or d.c.))? We're getting there: 1,205 titles in Amazon U.S. Broaden out some more: history and ("pacific northwest" or (washington and not (george or dc or d.c.))) Only a little better: 1,427 in Amazon U.S. Broaden yet more: history and ("pacific northwest" or oregon or california or idaho or (washington and not (george or dc or d.c.))) Jackpot! Now we would get:

  • Amazon U.S.A. reports exactly 4,000 titles (really 6,547--see note below)
  • Amazon U.K. reports exactly 4,000 titles (really 4,793--see note below)
  • Amazon Canada reports exactly 4,000 titles (really 5,103--see note below)
  • Amazon Germany reports exactly 3,880 titles.
  • Amazon France reports exactly 4,000 titles (really 4,362--see note below)
  • Amazon Japan reports exactly 2,490 titles.

The "note" is that anything over 4,000 might as well be 4,000. OK, Amazon Germany and Amazon Japan are not saturated, but the total is good. Also, there is some question of the degree of closeness of California or Idaho or Oregon to the history museums of Ritzville, Washington--but I for one would reckon that "history" part makes for reasonable relevance.


Actual Trials

So now you need to know how to quickly and easily test your various possible search phrases. Simple: just run the BookAdder script bookcount.php. At first run, it will use the phrase built into your customize.php file--right now, presumably (unless you fiddled with it already) "spokane". The page would look like this:


Search-Phrase Results:

For the search keyword phrase spokane:
  • Amazon U.S.A. reports exactly 111 titles.
  • Amazon U.K. reports exactly 63 titles.
  • Amazon Canada reports exactly 61 titles.
  • Amazon Germany reports exactly 45 titles.
  • Amazon France reports exactly 54 titles.
  • Amazon Japan reports exactly 27 titles.
Those totals include items that Amazon shows as:
  • currently for sale;
  • available for pre-order;
  • special orders;
  • new releases;
  • available but temporarily out of stock; and,
  • available for in-store pickup.
Word/Phrase to try next:




The process couldn't be easier: type your phrase into the text box, click the button, wait a few seconds (6 to 12) for the results, and there you are. (Note that the display above is not functional, as it has no way to know your site's URL.)

When you have finally settled on what you are sure you want to use as your real search phrase, do these things:

  1. with a text editor, edit it into customize.php at Step 2a;
  2. at Step 2b of customize.php, enter a nice, clean phrase to display to your visitors to show what you search for (such as, for the example used, History of the Western U.S. or--better, because shorter--Western U.S. History);
  3. upload the revised customize.php to your BookAdder directory; and,
  4. re-run finstall.php.
  5. run the script cronf.php in Test Mode (cronf.php>test=y.

What cronf does is simply start doall.php as a "free-running" (daemon) process; once it starts doall, cronf waits five seconds (pure paranoia) then exits, its job done. When you run cronf.php in Test Mode, it delivers a minimal message that will look something like this:


cronf: For URL: http://www.friends-of-red-cats.com/red-cats-book-shop/doall.php?daemon=y

cronf: Socket opened ok (16:05:13).
cronf: Response line gotten (16:05:13): HTTP/1.1 200 OK
cronf: Socket closed (16:05:18) - finished.


If you run cronf.php without the Test-Mode parameter, it displays nothing at all: your browser will show, in whatever way it does, "page loading" status for 5 or 6 seconds, then nothing. Meanwhile, in either style, doall is set running and functions just as it did when you ran it by hand, except that doall not having received any "testing" parameter, it simply does its job silently in the background, not sending any text output anywhere. The only way for you to keep track of what is going on--if you feel a need--is to use the status.php script, as explained earlier. But just figure on your ongoing first "real" run (a run, that is, using your real search phrase) as likely to take 1½ to 2 hours to complete. Meanwhile, there are a couple of useful ways you can fill some of that time.

Remember that if you start cronf.php in Test Mode, the entire run is done in Test Mode, so no sitemaps are made; to get sitemaps made, you must start cronf.php with no parameters.

Updating Your robots.txt File

General robots.txt Considerations

As you should and doubtless do know, your robots.txt, which must reside in your site's root directory, exists to tell searchbots certain things--in particular, to tell them which directories and files to not try to list in their indexes. There can be many reasons why you don't want each and every single file on your site to be listed in some search engine: some may be php scripts no one else has any business running, some may be test or archived files, some may be private, and so on. This file allows you to exclude those files from the bots' probing. (Assuming the bots are "well behaved"--many scummy outfits and individuals ignore robots.txt directives, but BookAdder contains other ways of slowing them down, more about which we'll discuss later.)

Though most webmasters know about robots.txt files, and use them, a surprising number don't know their actual syntax, and so make basic errors. Let's take a moment to make sure you have it right.

You can pass different blocks of directives to different search engines. Each block needs to begin with a line that specifies which searchbots its meant for; it specifies that bot by means of the user-agent identity that the bot recognizes. There's no a priori way to know the correct user-agent value for a given engine: you just have to look it up somewhere. Most webmasters, however, are content to use the same directives for all searchbots, which makes the block-head line very simple:

User-agent: *

Take careful note that this is the only place in a robots.txt file where any sort of "wildcard" character is allowed or has meaning. You cannot use wildcard characters in file or directory specifications in a robots.txt file.

The lines that follow can only block file indexing; you can disallow, but you cannot allow. The syntax is deceptively simple:

Disallow: /custom404.php

The specs you use must start at the root of your site, which means that for files in the root directory itself, the leading slash  /  must be present. Think of the spec as what you would have to tag onto your domain URL to reach that file or directory.

Now for the interesting part: there is a form of "implicit wildcarding" available. The way it works is built into the way searchbots are supposed to use your specifications. That way is to match the full spec of any file they find (they never index directories) against each directive line in robots.txt that applies to that bot; at the first line they come to that matches the full filespec for out as far as that line runs, the file must be marked off as a no-no. Let's see what that means in practice. Consider the line:

Disallow: /annuals/20

That line would block, among others, all of the following files:

/annuals/20050913
/annuals/20021115
/annuals/20001012
/annuals/20070102

It would not block any of:

/annuals/19980913
/annuals/19991115
/annuals/19871012
/annuals/19500911

Have a care: the match doesn't distinguish between directory and file components. This--

Disallow: /rate

--would disallow all of, among others, these:

/rates.zip
/ratelist.txt
/rates/jan1999
/rated/a-c/arthur_maxwell.dat

And last but far from least: Each directives block in a robots.txt file must be followed by a blank line--even if there's only one such block.


Updating robots.txt For BookAdder

There are many files in BookAdder--most, really--that are nobody's business but yours, and that you thus want excluded from search-engine indexes. When you ran finstall, it read your existing robots.txt file (if it found one) and produced a tentative replacement for it that includes all the necessary exclusions. That new file is named robots.new, and now resides in your main BookAdder directory. It includes your entire existing robots.txt file (if there was one) plus all the exclusions that you should have in place before opening the doors to your new Bookshop. If you have never run any version of BookAdder before, or had no existing robots.txt file, you can simply rename this new file to robots.txt and copy it into your root directory (as new, or over your existing version). You should, of course, eyeball it first to be comfortable, but this is a rather routine process.

(If you used any prior version of this software, your existing robots.txt file may already contain exclusions. It's best if you clean out the old exclusions before you run the installer for the first time--but if you didn't do that, it should be no great problem to see where the new ones begin, for they'll all be at the very bottom of the exclusions block, so you can just edit out the older ones above them.)

If this is your first use of BookAdder, you may wonder what the final directive line--

Crawl-delay: 5

--is for. That line, for those searchbots that recognize and honor it, will stop them from hitting your site any more often than once in 5 seconds, to avoid overburdening your server with searchbot hits. You can, if you like, change the number in the line (which is the minimum delay in seconds). Most bots do recognize and honor the directive, even though it is not yet a formal part of the robots.txt "official" specification. (Google does not recognize it, but guarantees that in any event it never hits any site more often than once every 6 seconds, which all veteran webmasters know is a lie, though Googlenots are usually well-behaved.) A bot that does not recgnize that (or any) directive should just ignore it.

Updating your robots.txt file was the first useful way of passing time while waiting for your first real search run to finish; now for the second way.


Setting Up Sitemapping

It is important that you have and use sitemaps, not just for this project but for your site as a whole. As I write this, both Google and Yahoo make use of sitemaps, and MSN is expected on board more or less any day now (they may already be by the time you read this). A sitemap is just what its name says: a map of your site. It is a list of all the pages you would like to have in a search engine's index, but it's more than just a flat list: it's a list with other helpful guidelines for the searchbots, such as how often a given page is likely to change, how important you think it is relative to the other pages on your site, and perhaps more. And it is formatted in a certain exact special way, as an XML file. For further details on sitemaps, go to the horse's mouth, the collaborative sitemaps.org site. For right now, what we are concerned with is setting your site up initially to make use of the sitemaps that BookAdder automagically makes for you at every run, because you need to get your sitemaps registered with Google and Yahoo (and maybe MSN) before they will use them.

BookAdder now makes no files outside its own directory, not even sitemaps or sitemap indexes. That means that even if you already are using sitemapping, you will need to separately set up the master map-index file for BookAdder at each engine's site. This way, there is no entanglement between the BookAdder sitemaps and any maps you may provide on your own for the rest of your site.


What Maps BookAdder Makes

Sitemaps actually come in two sorts: true maps, and "index" files, which are--in essence--maps of maps. Index maps exist because it is often much more reasonable to map a site into several distinct maps rather than one grand, all-inclusive map. (In fact, sometimes you have to make multiple maps, because the sitemap protocol imposes upper limits on number of pages per sitemap--50,000--and sitemap byte length, 10 MB) Index sitemaps are just that, indices: they are maps of the existing actual sitemaps.

BookAdder sitemaps itself, that is, all the search-engine-indexable pages it contains and makes. It does that as six distinct map files, one for the "pages" you have for each Amazon division. The files are gzipped (which the protocol allows and the search engines, and common sense, recommend) to save searchbot time and your bandwidth use. The titles of the mapfiles BookAdder makes are of this form:

  • red-cats-book-shop-us.xml.gz
  • red-cats-book-shop-uk.xml.gz
  • red-cats-book-shop-ca.xml.gz
  • red-cats-book-shop-de.xml.gz
  • red-cats-book-shop-fr.xml.gz
  • red-cats-book-shop-jp.xml.gz

There, the red-cats-book-shop is whatever is the actual name of your main BookAdder directory. If any map were to be so large it would need to be chunked into more than one file (which will not be the case till Amazon lifts or increases that insane 4,000-item limitation, if it ever does), the files would be, say for the U.K. division:

red-cats-book-shop-uk.xml.gz
red-cats-book-shop-uk2.xml.gz

You do not, however, want to register those maps with the search engines; instead, you will be registering your site's master bookshop sitemap index file. That file is named Bookshop_index.xml, and it was automatically made for you when you ran the finstall installer script; it now resides in your main BookAdder directory, where it belongs. Every time you do a search run (except in Test Mode), the run concludes by re-making the six divisional maps listed above, then modifying the index file to reflect the date and time of the newest maps. Not only that, but when it finishes mapping and updating the index, BookAdder will then automatically notify the search engines that an updated set of maps is ready--you don't have to do anything.

(If, for any reason you ever want or need to re-create the index file afresh, you can run the ancillary BookAdder script makeindex.php; it will not over-write an existing index file, but will make its new version under the name Bookshop_index.xml.NEW.)


Registering With the Engines

If you're not using sitemaps now, you should. Here's how you go about registering for sitemap use with the search engines that use them.

The general process is this: you tell the search engine, on a special page of their site, that you want to register your site with them; usually the "registration" will include not only sitemapping but other useful services from them as well. They, in response, will give you some "magic" file, typically with a long alphanumeric scramble as its name, that you then need to post in the root directory of your file; when you have posted it, you go back to the search engine's site and tell them it's there. (That is so they can be assured that it was indeed the site owner who registered the site.)

Note! For Google, since your bookshop sitemap-index file will be in the BookAdder main directory, you have to list that directory as a separate "site" before you can register the map itself. That's just the way it works. You add your actual site for its sitemapping, then you add the BookAdder directory so you can list its map. (Yahoo seems to accept the booshop index file with no special problem, so you don't need to add the directory as a separate "site" for them.)

As to sitemaps, they will ask you for the name or names of your sitemap files (or what Yahoo chooses to call "feeds"). Give them the full URL to your BookAdder master index file, which ought to be something of this form:

http://www.friends-of-red-cats.com/red-cats-book-shop/Bookshop_index.xml

It would be wise, though, to wait till your first real run has finished, and the maps have been made and the index updated, before registering the file with the engines.

To force sitemap making from a completed run (if, for example, it was made in Test Mode), just run the script named makemaps.php.

Mapping the Rest of Your Site

If you already have the rest of your site mapped, well and good, and you need nothing further here. If you do not have any non-BookAdder sitemaps, however, BookAdder includes a "bonus" tool that will help you make one--a script called makestatic.php, which makes an output file named staticmap.xml. There are various points about that script that you should clearly understand.

The makestatic script first looks into your robots.txt file, and will not list any file excluded there. Otherwise, it goes through your site's directories and subdirectories looking for all files with any of a few particular extensions--.html, .htm, .shtml, and .php--and adds any not excluded by robots.txt to its list (but it does not map any files in or below your BookAdder directory, because BookAdder maps those itself). When done, it converts that list to mapfile format; it then checks to see if you already have a staticmap.xml file; if you do, it further checks each entry in the tentative new file to see if there is a corresponding entry in the existing file, and if there is it uses the ancillary data (update frequency, relative priority) in the new listing. Then it makes a new staticmap.xml in your BookAdder directory (it is up to you to eyeball-validate that file, then move it to your root directory, then register it with the search engines). Here are the points to consider:

  1. The script only looks for files with those four extensions and none else.

  2. It does not check for file overflow (more than 50,000 files) nor size overflow (resultant mapfile larger than 10 MB).

  3. It does not gzip its results.

  4. It might have trouble with directory or file names containing characters that are not ordinary ASCII (things like é:); it might handle those ok--I hope and believe it does--but no guarantees.

  5. For files not already listed in an existing sitemaps.xml, it simply plugs in an "average" priority (0.5) and a rough guess at an appropriate change frequency; you will need to hand-fix those for all files being listed for the first time.

Let me elaborate on the special-characters issue. If all of your directory and file names are "simple", meaning that they are composed wholly of alphanumeric characters--    a - z    A - Z    0 - 9--there is no dificulty. Potential problems arise if you have used any "exotic" characters in your directory or file names.

I will not rehearse here the various complications associated with "URL-encoding", "HTML-encoding", "escaping", and so, but will simply spell out exactly how the static-map maker works. The bottom line is that you must eyeball all of the generated file listings to assure yourself that they conform to the standards of sitemap specifications.


My advice, always and ever, is to avoid using non-alphanumeric characters in directory or file names.

Your site's URL is always used as PHP reports it--typically of the form:

    http://wwww.mywonderfulsite.com

After that, makestatic separates the rest of the filespec, as PHP has reported it (which means as your server's filesystem reports it to PHP), into its components, using your local separator slash (which will be a plain slash  /  under Unix and a backslash  \  under Windows) as the component divider. Thus, a spec like--

    /index.html

--will have just one part, index.html. But a spec like--

    /français/puré/good&bad/some good & some bad.html

will be separated into the components:

français
puré
good&bad
some good & some bad.html

(The character between the "n" and the "a" of "français" is a "c-cedilla"; and the character at the tail of "puré" is an "e-acute"; both are non-ASCII, accented characters.)

The script then applies "URL encoding", as required by the sitemap specifications), to each of the separated parts in turn (it works this way so that the separator slashes do not themselves get encoded). That encoding process, to quote the PHP Manual:

"Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits. This is the encoding described in RFC 1738 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URL's from being mangled by transmission media with character conversions..."

Thus, the subdirectory-name and filespec examples above would be returned as:

fran%87ais
pur%82
good%26bad
some%20good%20%26%20some%20bad.html

The individual pieces are then put back together with appropriate separator slashes re-inserted, and the whole appended to the domain URL. For the given example, we would end up with:

    http://wwww.mywonderfulsite.com/fran%87ais/pur%82/good%26bad/some%20good%20%26%20some%20bad.html

Also: the encoding listed in the sitemap is "UTF-8"; the sitemap specifications like the sitemap file in "UTF-8" encoding, but also carefully insist that the actual filespecs be encoded so that your server will read them correctly. Presumably the form in which your server's filesystem reported them to php is the way in which it would be able to read them, but the easy test is to copy, exactly, any questionable spec into your brower's URL bar and see if it indeed takes you to the file in question; if it does, then the encoding is correct in that respect.

Note carefully that this initial encoding is probably incomplete--in the sense that if you have any php script files, or other files that take parameter strings, those of course are not provided. You must code them in by hand. For example, makestatic.php might have an entry like:

http://www.mywonderfulsite.com/divisions/division-info.php

That might well represent, in effect, many different pages, depending on the parameters, or "query string", following. You would have to hand-change that to something like, to make up an example:

http://www.mywonderfulsite.com/divisions/division-info.php?nat=usa&div=nyc
http://www.mywonderfulsite.com/divisions/division-info.php?nat=usa&div=la
http://www.mywonderfulsite.com/divisions/division-info.php?nat=can&div=ont
http://www.mywonderfulsite.com/divisions/division-info.php?nat=can&div=vanc
http://www.mywonderfulsite.com/divisions/division-info.php?nat=usa&div=chi
http://www.mywonderfulsite.com/divisions/division-info.php?nat=usa&div=sea

Note there the following out of the protocol's requirement that certain special characters, which include the ampersand, be translated to entities. That can be done automatically, but not when a script has no way to know what parameters you need in your URL calls. You have to do it as you enter the strings.

Also take note of the "change frequency" listed for each file. The makestatic script has a primitive intelligence: it looks at how long ago the file was last modified (or made) and uses that datum to select a tentative "change frequency" for the file. For example, a file last modified 45 days ago would be assigned a change frequency of yearly, since it last changed over a month ago, so that monthly seems inappropriate. That is better than giving everything some one arbitrary frequency, but it is nonetheless crude; a file last modified two days ago might be one that typically changes only rarely; a file last changed 11 months ago might be one that will soon start changing daily. Only you know; only you can check that those assignments make sense, and change them by hand where they don't.


All in all, I provide BookAdder, and I provide assured mapping of the files that BookAdder makes. The staticmap file is a bonus, and I expressly do not guarantee it.

In reality, I think it works OK, but it is your responsibility to check every file listing in the static map. Even if you need to hand-modify a lot, it's easier than building the whole thing from scratch by hand.


Actually Registering Your Sitemaps

Of course, if you are already a registered sitemap user at all the search engines, all you may have to do--depending on your case, as described above--is point the engines at the new Bookshop_index.xml master index file. Otherwise, you need to do the initial registration first, then specify that file (as well as any other non-BookAdder ones you may now have).

Here's where to go to set up registration:

Google
Yahoo (use the Submit Site Feed entry box)
MSN (search the page for "sitemaps")

You should, of course, wait till you are satisfied that your first real search run has concluded, and satisfactorily, before registering your new sitemap-index file. (And remember the earlier tip: you don't need to re-run the entire search just to make sitemaps of its results--simply call the script makemaps.php, which will map whatever the current search results are.)


Using Your Results

When sufficient time has passed since you started the search run--give it at least an hour and a half--look into the Shop.Log file in the BookAdder /logs subdirectory to see if it's done yet. If it is done, the last line will read (with the appropriate date/time stamp):

doall: Thursday, 11 January 2007, 10:03:09: Sitemaps-make complete; doall finished

If it does not say that, but the sequence of entries seems to show it running ok, let it be for a while longer. Don't give it up as lost till 1½ or even 2 full hours have gone by with your getting that line in the logfile.

(If you have a problem here, and can't figure it out on your own, don't hesitate to e-mail me and we'll work it out for you.)

When the run is fully completed, go to your bookshop front-page script (whatever you named it in customize.php at Step 3) and make sure you're happy with everything. I do hope you are.


Opening Up Your New Shop

If the shop indeed looks ready for business, you need to provide links to its front page. And be absolutely, positively sure to remember that you need to provide six separate links, one for each divisionally based shop. The actual links will be of this form, with--of course--your own correct data plugged in:

US:  http://www.paint-red-cats-blue.com/red-cats-book-shop/paint-red-cats-book-shop.php?in=us
UK:  http://www.paint-red-cats-blue.com/red-cats-book-shop/paint-red-cats-book-shop.php?in=uk
CA:  http://www.paint-red-cats-blue.com/red-cats-book-shop/paint-red-cats-book-shop.php?in=ca
DE:  http://www.paint-red-cats-blue.com/red-cats-book-shop/paint-red-cats-book-shop.php?in=de
FR:  http://www.paint-red-cats-blue.com/red-cats-book-shop/paint-red-cats-book-shop.php?in=fr
JP:  http://www.paint-red-cats-blue.com/red-cats-book-shop/paint-red-cats-book-shop.php?in=jp

You can put those links anywhere on your current site, but why not put them somewhere on your overall front page? And if you use a templated include/dropin site directory on your pages, be sure all six are there, too. This isn't just added pages, this is money.

Meanwhile, let me repeat this: the bookshops for the divisions where English is not the native tongue will list only books in English available for sale through those divisions. If your visitor is a servicewoman stationed in Okinawa, she can order English-language books from nearby Amazon Japan, with your site--and the books you offer her--all in English.

(If you want or need to have a bookshop that searches for books in a division's native tongue, e-mail me about it and I'll work it out for you.)


Setting Up Auto-Updating

It is very important that you set your new bookshop up so that it will be updated on average once every 24 hours. (For one thing, Amazon's Terms of Service rightly don't allow displaying prices over 24 hours old). Besides, you want your wondefful new stock of pages to change as often as possible for SEO purposes as well. Fortunately, setting this up is very simple to do, and once it is done, the whole thing takes place, day after day, invisible to you with no further effort on your part.

At bottom, you rely on whatever task scheduler your host makes available to you. I know nothing about Windows-based servers, but all Unix-based servers include the well-known facility called cron; if you are on a Windows-based server, you'll have to ask your host to help you out here--but what you want to accomplish is very simple.

Access to and use of a cron scheduler varies somewhat from host to host. Here is some general guidance, but your host is the final source of valid information on using cron on your server. Your first task will be to find where the cron interface is. Go to whatever sort of "control center" your host provides for your account and poke about--you're looking for references to either "cron" or a "scheduler" (some hosts hide this under something called "Advanced Options" or the like).

Regardless of the exact appearance of the cron interface on your host, it should be simple and intuitive to use; what you basically need to tell it is how often to run the task (in this case, daily), when to run it (you can pick a particular time, or you can be generous and let cron pick a time based on when the server's overall load is low), and what command (with what command parameters, if any) it is to run. If you pick the run time, try to select a time block, figuring about 1½ to 2 hours for the total run, when your site is likely to be least busy.

As to the command to run, there are two ways to go about this, both using as the ultimate target command the BookAdder script cronf.php; one way calls it as a local filesystem file, the other accesses it through the internet by HTTP. Curiously, the second way is simpler and more reliable, because calling a php script as a local-filesystem command can get hairy (for example, if you are cgi-wrapped, you probably need to instead call the wrapper with the command as a parameter).

To call cronf.php as a filesystem file, you'll need the complete path to the file as the "command" you specify for it. If you're unsure what that is, just re-run the tryme1st.php script: it'll include, in that big info box up near the top of the screen, that line that goes:

Path to this directory on server: /usr/www/users/catfiend/public_html/catsite/red-cats-book-shop/

just take what you see there and tag cronf.php onto it, so you get something like--

/usr/www/users/catfiend/public_html/catsite/red-cats-book-shop/cronf.php

--as the command to tell cron to run. No parameters are necessary.

But the way I recommend--assuming your host makes the wonderful WGET utility available, as almost all good hosts do--is to use HTTP. This time, what you need to know is the full path on your host's server to the WGET utility; you'll have to ask about that, but typically the full command you specify to cron might look something like this:

/usr/local/bin/wget -q -O - http://friends-of-red-cats.com/red-cats-book-shop/cronf.php > /dev/null

There, /usr/local/bin/ is the path, on your host's server, to the directory in which the WGET utility is; -q and -O are flags being passed to WGET (don't worry what they do); and < /dev/null tells WGET to just dump its output--any text it reads from the called script--into "the bit bucket" (oblivion, the null device). Again: the exact form you will need you must get from your ISP, but it should look something like that (excepting, of course, for the substitution of your true site URL and the path to your actual bookshop directory.

(The CURL utility works in much the same way as WGET, and could probably be used instead if you know its parameter syntax.)

So, again, in summary:

  1. find out from your host the exact path specification for calling the WGET utility on your server;
  2. also find out how to get access to and use the cron scheduler;
  3. set cron to run a command daily, at a time of your or its choosing; and,
  4. set the command to be run as wget -q -O -http://friends-of-red-cats.com/red-cats-book-shop/cronf.php > /dev/null (using appropriate URL values for your site and bookshop).

It's easier to do than to tell about.

You should check your logfile, and the shop itself, the first two or three days to make sure everything looks right; after that, try to remember to check every month or so to make sure nothing has gone astray for any weird reason.

Und dot's dot.


Moving On

BookAdder Documentation Files Available

They are:


What to Read Next

You are now ready to further personalize BookAdder (which includes setting it up for regular automated runs). Click on to proceed to the Further Customizing BookAdder docfile.