How to avoid the repeated page crawl site page



spider in dealing with the dynamic web page information is facing the problem. Dynamic web page refers to the automatically generated by the program page. Now the Internet developed program development more and more dynamic web scripting language, nature developed more and more, such as JSP, ASP, PHP and so on some language. The spider is difficult to handle these scripting languages generated web pages. The optimization of personnel in the optimization of the time, always emphasize to not use the JS code to improve the handling of these spiders have their own language and script. In website optimization, reduce the number of unnecessary script code, so that the spider crawling, resulting in fewer repeat grab page page

, through the robots file to use this page to screen out, specifically syntax:

website content changes frequently, not just the template update. The spider is constantly updated and spider crawling web content, developers will set an update cycle for reptiles, let it in the specified time to scan the site, view the contrast which pages are the need to update the work, such as: the home page title have changed, what page is the site of new page, which page is have expired dead links. The update cycle of a function is too strong search engine is constantly optimized, because there are a lot of influence the update cycle of search engines to search engine recall. But if the update cycle is too long, it will make the search engine’s accuracy and completeness is reduced, there will be some new generation of "not search; if the update cycle too short, technology difficult to achieve, but also to bring.


spider met

dynamic web page !Three,

Disallow: /page/ # limit if you grab WordPress paging website need can also put together to write the following statement, to avoid too many repeated pages. * Disallow: /category/*/page/* # limit grab paging * Disallow:/tag/ # classification Label Page * Disallow: */trackback/ limit grab grab Trackback * Disallow:/category/* # limit content # limit grab all what classification list is a spider, also called crawlers, is actually a program. The function of this procedure is, along your website URL layers read some information, do a simple treatment, and then fed back to the server for centralized treatment. We must understand the spider like, can do better on the website optimization. We talk about the next spider working process.

observation on Web logs, found on the web page is page spider grab a lot of repeat, so the site optimization is not very good. So how do we avoid web pages by spiders crawl

Leave a Reply

Your email address will not be published. Required fields are marked *