4WebHelp
 FAQ  •  Search  •  User Groups  •  Forum Admins  •  Smilies List  •  Statistics  •  Rules   •  Login   •  Register
Toggle Navigation Menu

 robots.txt
Post New TopicReply to Topic
View Previous Topic Print this topic View Next Topic
Author Message
Richard
WebHelper
WebHelper


Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK

PostPosted: Mon Apr 15, 2002 10:56 pm (22 years ago) Reply with QuoteBack to Top

I want only for search engines to include my Home Page when they crawl through my website. I don't want any other pages to be added. I heard about robots.txt which seemed great but using the following method would have taken me ages because of the amont of pages I have :
Code:
User-agent: *
Disallow: /folder1/
Disallow: /folder2/
Disallow: /folder3/
Disallow: file1.html   etc......


Can it be done in an easier way without me having to include every page in the robots.txt file or by placing a code into each page I don't want included ?

Thanks Smile
OfflineView User's ProfileFind all posts by RichardSend Personal Message
Darren
Team Member



Joined: 05 Feb 2002
Posts: 549
Location: London

PostPosted: Tue Apr 16, 2002 7:23 am (22 years ago) Reply with QuoteBack to Top

Quote:
To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "docs", and leave the one file in the level above this directory:

User-agent: *
Disallow: /~joe/docs/

From:
http://www.robotstxt.org/wc/exclusion-admin.html

OR......

Code:
<html>
<head>
<meta name="robots" content="noindex,nofollow">
<meta name="description" content="This page ....">
<title>...</title>
</head>
<body>

See here for more details:
http://www.robotstxt.org/wc/meta-user.html
OfflineView User's ProfileFind all posts by DarrenSend Personal MessageVisit Poster's Website
Darren
Team Member



Joined: 05 Feb 2002
Posts: 549
Location: London

PostPosted: Tue Apr 16, 2002 7:27 am (22 years ago) Reply with QuoteBack to Top

Just found this which contradicts what the other site says - maybe this is newer?

Quote:
The record starts with one or more User-agent lines, specifying which robots the record applies to, followed by "Disallow" and "Allow" instructions to that robot. To evaluate if access to a URL is allowed, a robot must attempt to match the paths in Allow and Disallow lines against the URL, in the order they occur in the record. The first match found is used. If no match is found, the default assumption is that the URL is allowed. For example:

User-agent: webcrawler
User-agent: infoseek
Allow: /tmp/ok.html
Disallow: /tmp


See for more details:
http://www.wdvl.com/Location/Search/Robots.html
OfflineView User's ProfileFind all posts by DarrenSend Personal MessageVisit Poster's Website
Daniel
Team Member



Joined: 06 Jan 2002
Posts: 2564

PostPosted: Tue Apr 16, 2002 9:03 am (22 years ago) Reply with QuoteBack to Top

I think both methods will work; they're just different ways of doing things.

________________________________
Image
OfflineView User's ProfileFind all posts by DanielSend Personal Message
Darren
Team Member



Joined: 05 Feb 2002
Posts: 549
Location: London

PostPosted: Tue Apr 16, 2002 9:14 am (22 years ago) Reply with QuoteBack to Top

When I said they contradicted each other I was referring to the info about the robots.txt where one site said there was no 'Allow' field was available and the other did. Laughing
OfflineView User's ProfileFind all posts by DarrenSend Personal MessageVisit Poster's Website
Richard
WebHelper
WebHelper


Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK

PostPosted: Tue Apr 16, 2002 9:31 am (22 years ago) Reply with QuoteBack to Top

So, how should I write it to only accept my Home Page ?. Something like :
Code:
User-agent: *
Allow: index.htm


Thanks Smile
OfflineView User's ProfileFind all posts by RichardSend Personal Message
Daniel
Team Member



Joined: 06 Jan 2002
Posts: 2564

PostPosted: Tue Apr 16, 2002 9:35 am (22 years ago) Reply with QuoteBack to Top

Try that and see if it works, but personally I would use the <meta> tag (not that it's better).

________________________________
Image
OfflineView User's ProfileFind all posts by DanielSend Personal Message
Richard
WebHelper
WebHelper


Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK

PostPosted: Tue Apr 16, 2002 1:04 pm (22 years ago) Reply with QuoteBack to Top

But the <meta> tag is not very widly excepted. Why would you pick this option, Daniel ?

Thanks
OfflineView User's ProfileFind all posts by RichardSend Personal Message
Daniel
Team Member



Joined: 06 Jan 2002
Posts: 2564

PostPosted: Tue Apr 16, 2002 6:22 pm (22 years ago) Reply with QuoteBack to Top

Don't know why. I didn't know it wasn't widely accepted. Where did you find that out?

________________________________
Image
OfflineView User's ProfileFind all posts by DanielSend Personal Message
Richard
WebHelper
WebHelper


Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK

PostPosted: Tue Apr 16, 2002 9:27 pm (22 years ago) Reply with QuoteBack to Top

I saw it on the website @ ....
http://www.robotstxt.org/wc/meta-user.html

Thanks
OfflineView User's ProfileFind all posts by RichardSend Personal Message
Peter
Team Member



Joined: 09 Jan 2002
Posts: 147
Location: UK

PostPosted: Wed Apr 17, 2002 12:26 pm (22 years ago) Reply with QuoteBack to Top

Richy wrote:
So, how should I write it to only accept my Home Page ?. Something like :
Code:
User-agent: *
Allow: index.htm


Thanks Smile


Try:
Code:
User-agent: *
Disallow: *
Allow: index.htm


Which should block everything apart from index.htm.

But the question is why should you want to do this?
Peter.

________________________________
Maple Design - quality web design and custom programming
OfflineView User's ProfileFind all posts by PeterSend Personal MessageVisit Poster's WebsiteYahoo MessengerICQ Number
Richard
WebHelper
WebHelper


Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK

PostPosted: Wed Apr 17, 2002 4:32 pm (22 years ago) Reply with QuoteBack to Top

Another one of my sites is going through a major change. The only page currently ready is the Home Page but making the rest of the new site will be a lengty process and I want my site to be recorded on the site engines as soon as possible. As time goes on, I will include the new pages as I add them to the site.

Thanks for everybody's help Smile
OfflineView User's ProfileFind all posts by RichardSend Personal Message
Richard
WebHelper
WebHelper


Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK

PostPosted: Sun Apr 21, 2002 10:56 am (22 years ago) Reply with QuoteBack to Top

Another question .. Will :
Code:
Disallow: *.gif

be accepted by the search engines ?

Thanks Smile
OfflineView User's ProfileFind all posts by RichardSend Personal Message
Darren
Team Member



Joined: 05 Feb 2002
Posts: 549
Location: London

PostPosted: Sun Apr 21, 2002 12:48 pm (22 years ago) Reply with QuoteBack to Top

Not sure, but why not have all your images in one directory, and do this:

Code:
Disallow: /images/


Question

Keeps your files nice and tidy as well Laughing
OfflineView User's ProfileFind all posts by DarrenSend Personal MessageVisit Poster's Website
Justin
4WebHelp Addict
4WebHelp Addict


Joined: 07 Jan 2002
Posts: 1060

PostPosted: Sun Apr 21, 2002 12:53 pm (22 years ago) Reply with QuoteBack to Top

and if you want to be listed by the major Search Directories, use a meta description tag, as we give preference to sites including one, as it makes our life easier rather than having to write a description for you, but remember NOT to use hype, or we won't list you...
OfflineView User's ProfileFind all posts by JustinSend Personal MessageSend email
Display posts from previous:      
Post New TopicReply to Topic
View Previous Topic Print this topic View Next Topic


 Jump to:   


Go to page 1, 2  Next

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot edit your posts in this forum.
You cannot delete your posts in this forum.
You cannot vote in polls in this forum.


Page generation time: 0.3394 seconds :: 18 queries executed :: All Times are GMT
Powered by phpBB 2.0 © 2001, 2002 phpBB Group :: Based on an FI Theme