FAQ



General

Account & Financial

Crawling

Reporting


Still have a question?
   Contact us or just try it out !





Q : Why using a broken links checker

Checking your website’s links is important for savvy SEO and user experience.

  • Broken links are not user-friendly
  • Broken links affects your website’s conversion rate and your sales
  • Broken links affects your Google SEO ranking. “Check for broken links and correct HTML” is written in their Webmaster Guidelines

As your website gets larger, checking them manually takes more and more time.
To save time, and therefore money for your organization, you may find useful to use a specialized robot for crawling your website.

In addition, you may collect more data, enabling your check to provide more information to your colleagues (webmaster, web editor, server manager, etc.), and to your managers. Crawlforme does this, and more.

Q : How to check your website?

Checking a website can be a long and painful task to achieve. We made up of you a checklist of all that has to be checked before launching your website, or in maintenance/monitoring mode.
Fortunately, several online automated tools can help you through this list. We mentioned the ones we prefer.

Task Tool
Browsers compatibility BrowserStack
W3C validation validator.w3.org/
Favicon Woorank
Search Engine submission Manual
Spelling, grammar and punctuation WebElexir
Forms, links, images, 404 pages, 301 redirects, protected pages CrawlForMe
Websites & Internet Services monitoring InternetVista
Analytics code Crawlforme Custom
Working backup system Manual
Traffic loads Depending of your hosting provider
Secure certificate ssltools.com
Http request analyzer web-sniffer

Q : How to test your website’s links?

If your website has few pages, you may take the time to check each page links every once in a while.
As your website enlarges, you may need a robot to crawl/visit and check your website for broken links, missing images, unspecified items, or https break out.
To be sure that your website contains any error, you have to launch frequently a crawl. It is why an automatic tool with scheduled tasks as CrawlForMe is essential.

Q : What types of payment do you accept?

All transactions are done via Paypal.

Q : Do I pay a monthly fee?

No, when you buy an plan, you buy it for a fixed period. At the expiration of your plan, to continue to use the CrawlForMe services, you need to buy a plan for a new period.

Q : Do I get an invoice?

Yes, for all payments, you have an invoice. You can view and download your invoices via the tab ‘Profile‘ → ‘Financial‘. The invoices are in PDF format.

Q : When am I charged?

You are charged when the plan is bought for all the chosen period.

Q : Can I upgrade my plan?

If you realize that your plan doesn’t fit your needs, you can upgrate it. To do so, contact us and we will do it for you.

Q : I’m subject to VAT and I have a VAT number

You can fill in all your VAT information in the tab ‘Profile‘ → ‘Legal entity‘. The laws on VAT are followed by CrawlForMe. If no VAT is applicable for you, the invoice mentions the corresponding legal section.

Q : I received a promotional code, what should I do?

A promotional code gives you a free plan. This plan can be used normally as paying plans. To enter your promotional code, click on tab ‘Profile‘ &rarr ‘Financial‘ &rarr ‘Promo code‘. Enter your code in the field and then click on ‘save‘. A message will inform you if your account has been correctly updated.
Note that you can’t use the same promotional code twice!

Q : What do I have to do if I lose my password?

Don’t worry; click on login and then on the ‘Did you forget your password?‘ link. Enter your email address and we will send you a new password. Once connected, you must enter a new password of your choice.

Q : How does CrawlForMe work?

CrawlForMe is a web crawler. From an start url called seeds, it parses the content and follows all the links to go from one page to another in the depness of your website.All the links and resources are checked and the results are listed in a report.

A large number of file types are supported and gathered into groups. You can then decide which kind of resources have to be checked. When applicable, you can either choose to check or omit embedded code. For instance, for JavaScript, you can check that de JS file exists, and you can also ask CrawlForMe to parse the file itself. You may ignore a particular link, a range of links or even define a regular expression to match or find undesired links.

CrawlForMe produce an online intractive report that you can share, comment, compare and consult. It contains

  • Error, Successfull, Redirected and Ignored tabs containing all checked links with useful information
  • Error inspector with error links highlighted in your source code
  • Search filters to quickly analyse data
  • CSV exports

CrawlForMe takes advantage of multi-threading by checking multiple links in parallel to reduce crawling time. To avoid server overload and potential DDOS effects, a limit to concurrent access is set as well as a request delay between requests.
CrawlForMe uses the latest technologies available on the market to give the best performance and reliability in all circumstances.

Q : What is a seed?

A seed is a starting point for the crawl of your website. Since your site can sometimes be accessed in different ways, you can define different seeds. For example, there can be a B2B part of the website accessed through business.mycompany.com and a B2C one accessed by private.mycompany.com.

Q : What is a scheduled crawl?

This feature allows users to define when their websites should be crawled. It can be either a single crawl or a recurrent crawl (daily,weekly, monthly, yearly) .

Q : What is a resource?

resource represents a part of your website. A resource is a web page or an online file. These are objects which are ‘available‘, or ‘can be located and accessed‘ through your website. This can be a page, a file (.css, .js), an image, etc. Each resource may have many child resources. Example: a web page with three images and a css file.

Q : What is a unique link?

unique link is a link to a resource. But if this resource is called by many pages in your website, CrawlForMe only counts one resource, hence the term ‘unique‘.

Q : Does CrawlForMe use the robots.txt file?

Website owners use the /robots.txt file to give instructions about their site to web robots. By default, CrawlForMe follows the instruction of robots.txt. You can choose to ignore the robots.txt file. To do that, just uncheck the box in the crawler’s options.

Q : When I create a website, why doesn’t the time correspond with the ‘real‘ time?

That may be because you haven’t saved your time zone. If that’s the case, Craw For Me! displays the time as GMT+0. To edit this data, go to ‘Profile‘ → ‘Legal Entity‘ → ‘Edit‘ → Select your time zone from the list → ‘Save‘.

Q : Why does the crawl take such a long time?

There can be many reasons:

  • The more resources your website has, the longer the crawl can take.
  • Some web pages may be slow.
  • Sometimes, CrawlForMe waits to avoid overloading your website.

If you want to check the progress of the crawl, go to your dashboard and click on the ‘Runtime‘ button.

Q : In my report, why do I have a unique link reported as error 403 but the link works fine?

For a unique link, CrawlForMe first reads the HTTP response header. It’s possible that for that link, you have an error 403 but your website does a redirection in this case. So the response is a 403 – forbidden, but you don’t see it.

Q : Why do I have a partial report of 5000 unique links?

Because you are in trial mode. In this mode, the report shows you only the first 5.000 unique links. Once convinced by CrawlForMe (and we’re sure that will soon be the case), just subscribe to a plan and launch a new crawl.

Q : I don’t understand ignored patterns

Ignored patterns are used to not analyse resources whose name (full path of the link) matches the pattern. These objects have the pattern itself as their property name, a type (parsed value or native regex), and an action (find or match).

Parsed value
This is the most user-friendly pattern. Just enter a string that will be found in the URL of the normalised link. This behaviour is similar to the Windows search. The action associated with this pattern is always FIND. Some special characters, or wildcards, are allowed. You can escape wildcards, by putting a backslash ‘\‘ in front of the wildcard, to use their literal value. That’s how you can escape your own special characters: searching for “*\?*” to match a URL containing a literal ‘?‘ character, like http://www.example.com/search.php?query=some_text.
Wildcards
* and ?. Use * to match zero or more characters, use ? to match any single character. Example: you want to ignore all links with ‘foo’ in the URL, you put *foo* as an ignored pattern. Another example: you want to ignore the links /bar1.html and /bar2.html, you put /bar?.html as ignored pattern.

Escaped characters: -, ^, $, [,],.
Native regex
This pattern is the most powerful. It consists of a valid pattern, which can be compiled. You can choose between two actions: FIND and MATCH. The difference between the two is the scope of the pattern. FIND looks for occurrences of a pattern within a string, while MATCH attempts to match the entire string with the pattern.

For more information on regular expressions, we recommend a href=”http://en.wikipedia.org/wiki/Regular_expression” class=”external”>this article on Wikipedia. If you’re looking for a regular expression ‘recipe‘, you can search the regexlib.com Regular Expression Library for solutions to common tasks.

Escaped characters: -, ^, $, [,],.

Q : What happens if I have a URL in the text of my website?

Within the body tag, CrawlForMe only parses links within the following tags: a, link, img, iframe and object. Textual content is not parsed.

Q : What happens if I have a URL in my JavaScript?

To parse JavaScript, you must first check the option in the checking config.
Once this is done, CrawlForMe parses ‘script‘ tags whose type is ‘text/javascript‘ and events onfocus, onblur, onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmousemove, onmouseout, onkeypress, onkeydown, onkeyup. Inside the script, CrawlForMe parses well-known redirect mechanisms like document.location, window.href, top.location, top.href, top.replace, top.open and correctly formatted links. All other URLs in a script are not analysed, like a URL in a code comment.

Q : I can’t find my advertising links

If your advertising links are embedded in JavaScript code, it’s possible that CrawlForMe did not analyse the links, depending on the JavaScript configuration.
If the URLs of your ads are built by JavaScript, it is also normal that we don’t find them, because CrawlForMe does not interpret JavaScript.

Q : How can I share a report without giving my login and password?

When your report is complete, you can simply share it. We can provide you an link with an external key that can be pasted into a browser.
To generate a share key, go to your dashboard, click on the ‘Results‘ tab. Next to the report, you’ll find an icon to share the report.

Q : Do you have a demo report and website?

The best way to discover CrawlForMe and all its functionalities is to use it on a ‘broken website’. It’s why we have developed the test website CheckMyBrokenLinks.com

On CheckMyBrokenLinks.com, you will find useful information about crawling, HTTP status codes and robots.txt. But the site also contains broken links and missing resources (picture, js or css). These resources generate various kind of errors in the demo report.

A demo report for the test site described hereabove has been generated by CrawlForMe. We invit you to open this report and discover the great functionalities of the CrawlForMe online report on a ‘broken website’ :

  • Error, Successfull, Redirected and Ignored tabs containing all checked links with useful information
  • Error inspector with error links highlighted in your source code
  • Search filters to quickly analyse data
  • CSV exports

But if you prefer to test the entire process, feel free to register, and launch a crawl of CheckMyBrokenLinks.com by yourself, you’ll see how easy it is !

Q : What are the HTTP response status code?

In the reports generated by CrawlForMe each link is associated to HTTP response status code. You will find below informations about the http status code. The most popular status are the code 200 for a successfull request and the code 404 for a page not found. But there are dozens of other codes.

See below the list of all status codes

Informational 1xx

100 Continue
101 Switching Protocols

Successful 2xx

200 OK
201 Created
202 Accepted
203 Non-Authoritative Information
204 No Content
205 Reset Content
206 Partial Content

Redirection 3xx

300 Multiple Choices
301 Moved Permanently
302 Found
303 See Other
304 Not Modified
305 Use Proxy
306 Switch Proxy
307 Temporary Redirec

Client Error 4xx

400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
410 Gone
411 Length Required
412 Precondition Failed
413 Request Entity Too Large
414 Request-URI Too Long
415 Unsupported Media Type
416 Requested Range Not Satisfiable
417 Expectation Failed

Server Error 5xx

500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported

Q :How can I correct my broken links?

For each error found and listed in the report, a link to the CrawlForMe code inspector is available.
The CrawlForMe inspector highlight the links in error in the source code of the page.
You have just to correct the line in your code and relaunch a crawl to verify the correction.