PHP Sitemap - Scan problem HTML5

Started by txmodxoops, July 10, 2016, 15:42:31 PM

Previous topic - Next topic

txmodxoops

Hi Elmar!

There is a little question

The code:
if ($content[0] != "text/html")
{
   echo "Info: $url is not a website: $content[0]" . NL;
    return false;
}


it create problem in html5

look manual here: http://www.w3schools.com/tags/tag_meta.asp

Elmar

Do you mean this?

HTML 4.01: <meta http-equiv="content-type" content="text/html; charset=UTF-8">
HTML5: <meta charset="UTF-8">

Regards
Elmar


Elmar

Quote from: txmodxoops on July 10, 2016, 15:42:31 PM
if ($content[0] != "text/html")
{
   echo "Info: $url is not a website: $content[0]" . NL;
    return false;
}


This checks the headers that are sent by the web server before the file is sent. It does not matter what meta tag is set in the html file.
See https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

txmodxoops

#4
I showed this problem because I get this message by email with cron job:

Scan http://www.mywebsite.org
Info: http://www.mywebsite.org is not a website:

Done.
sitemap.xml created.

In fact you are not created urls if you don't use this tag in content[0] Scan function

In html5 simply use this tag: <meta charset="UTF-8">

I use HTML5 for my web site  :)

Elmar

What web server software do you use?

I assume www.mywebsite.org is a place holder. Whats the real URL that I can test it?

txmodxoops

Then!

The server uses Linux as a system

The CMS is xoops and the link on my site I've already given in previous posts :)

Elmar

Quote from: txmodxoops on July 11, 2016, 14:27:46 PM
The server uses Linux as a system

You web server software is Apache.


Your server is still blocking my IP address. I sent you a mail about that.

I tried version 2.0-test2 and it worked fine, even with the HTML5. But I did not make a full sitemap generation. Just a few pages, because of the risk to be blocked with another IP too.

Is maybe a special page the problem? Whats the real info line?

txmodxoops

http://www.txmodxoops.org/sitemap-2.0-test2.php = Anonymously, I created this

Scan http://www.txmodxoops.org Scan http://www.txmodxoops.org/modules/contact/index.php Skip page anchor: #modalLogin Scan http://www.txmodxoops.org/modules/profile/register.php Skip page anchor: #modalLogin Scan http://www.txmodxoops.org/modules/profile Redirected to: http://www.txmodxoops.org/modules/profile/ Skip page anchor: #modalLogin Scan http://www.txmodxoops.org/modules/xdonations/index.php Skip page anchor: #modalLogin Skip page anchor: #comments Skip page anchor: #tweets Scan http://www.txmodxoops.org/modules/mylinks/index.php Scan http://www.txmodxoops.org/modules/xnews/index.php Scan http://www.txmodxoops.org/modules/imprint/index.php Scan http://www.txmodxoops.org/modules/xsitemap/index.php Scan http://www.txmodxoops.org/modules/downloads/index.php Scan http://www.txmodxoops.org/modules/meteoit/index.php Scan http://www.txmodxoops.org/modules/xoopstube/index.php Scan http://www.txmodxoops.org/modules/tutorials/index.php Scan http://www.txmodxoops.org/modules/mythemes/index.php Scan http://www.txmodxoops.org/modules/mymodules/index.php Scan http://www.txmodxoops.org/modules/partads/index.php Scan http://www.txmodxoops.org/modules/xoopspoll/index.php Scan http://www.txmodxoops.org/modules/xdonations Redirected to: http://www.txmodxoops.org/modules/xdonations/ Scan http://www.txmodxoops.org/modules/xnews Redirected to: http://www.txmodxoops.org/modules/xnews/ Scan http://www.txmodxoops.org/modules/tutorials Redirected to: http://www.txmodxoops.org/modules/tutorials/ Scan http://www.txmodxoops.org/modules/downloads Redirected to: http://www.txmodxoops.org/modules/downloads/ Scan http://www.txmodxoops.org/modules/mylinks Redirected to: http://www.txmodxoops.org/modules/mylinks/ Scan http://www.txmodxoops.org/modules/codelink Scan http://www.txmodxoops.org/modules Redirected to: http://www.txmodxoops.org/modules/ Scan http://www.txmodxoops.org/errorpage.php?error=404 Scan http://www.txmodxoops.org/modules/mymodules Redirected to: http://www.txmodxoops.org/modules/mymodules/ Scan http://www.txmodxoops.org/modules/mythemes Redirected to: http://www.txmodxoops.org/modules/mythemes/ Skip page anchor: #comments Skip page anchor: #tweets Skip page anchor: #comments Skip page anchor: #tweets Skip page anchor: #comments Skip page anchor: #tweets Scan http://www.txmodxoops.org/modules/contact Redirected to: http://www.txmodxoops.org/modules/contact/ Skip page anchor: #modalLogin Scan http://www.txmodxoops.org/register.php Scan http://www.txmodxoops.org/index.php Scan http://www.txmodxoops.org/modules/downloads/singlefile.php?lid=130 Scan http://www.txmodxoops.org/modules/downloads/singlefile.php?lid=129 Scan http://www.txmodxoops.org/modules/downloads/singlefile.php?lid=128 Scan http://www.txmodxoops.org/modules/downloads/singlefile.php?lid=127 Scan http://www.txmodxoops.org/xnews/argomenti/1/xoops.html Scan http://www.txmodxoops.org/xnews/articoli/84/installazione-e-aggiornamento-con-composer.html Scan http://www.txmodxoops.org/userinfo.php?uid=1 Scan http://www.txmodxoops.org/xnews/stampa/84/installazione-e-aggiornamento-con-composer.html Scan http://www.txmodxoops.org/xnews/pdf/84/installazione-e-aggiornamento-con-composer.pdf Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=13&lid=21 Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=11&lid=18 Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=1&lid=19 Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=12&lid=17 Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=1&lid=12 Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=4&lid=16 Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=5&lid=14 Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=1&lid=15 Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=1&lid=13 Scan http://www.txmodxoops.org/modules/mylinks/singlelink.php?cid=1&lid=7 Scan http://www.txmodxoops.org/modules/downloads/singlefile.php?lid=35 Scan http://www.txmodxoops.org/modules/downloads/singlefile.php?lid=34 Scan http://www.txmodxoops.org/modules/downloads/singlefile.php?lid=44 Scan http://www.txmodxoops.org/modules/downloads/singlefile.php?lid=42 Scan http://www.txmodxoops.org/tutorials/argomenti/4/php.html Scan http://www.txmodxoops.org/tutorials/articoli/340/cosa-sono-i-namespace-e-come-si-usano.html Scan http://www.txmodxoops.org/tutorials/stampa/340/cosa-sono-i-namespace-e-come-si-usano.html Scan http://www.txmodxoops.org/tutorials/pdf/340/cosa-sono-i-namespace-e-come-si-usano.pdf Skip page anchor: #comments Skip page anchor: #tweets Folder: cache Done. v20t2_sitemap.xml created.

txmodxoops

I checked on the server, your IP is not blocked, why you tell me that is blocked?

How?

Elmar

Quote from: txmodxoops on July 11, 2016, 18:41:17 PM
I checked on the server, your IP is not blocked, why you tell me that is blocked?

Because one IP can connect to your server, and another IP, which was false reported as attack, because of the scanning with the sitemap script during the bug searching (you remember? I told you that a few weeks ago) cannot connect to your server. Thats the reason why I say that your server blocks one of my IP's. I don't care if my IP is blocked. It's just bad for you, because it makes it more difficult for me to figure out why the script does not work correctly when you switch to HTML5.


Quote from: txmodxoops on July 11, 2016, 18:41:17 PM
How?

How should I know that. It's your system.



txmodxoops

It might be sufficient to change this script:

// Check content type for website
/*if ($content[0] != "text/html")
{
    echo "Info: $url is not a website: $content[0]" . NL;
     return false;
}*/


both for xhtml and html5

I commented to make it work with html5

Elmar

I am sure, the problem that no Content-Type header is sent by your server when you use HTML5 has to do with a configuration on your server. When you run the script local, then no Content-Type header is sent. When I access your site, then the Content-Type header is sent correctly even on your HTML5 pages. But I don't want to discuss this behavior any longer. The sitemap script has now an additional parameter to handle such situations. Its "IGNORE_EMPTY_CONTENT_TYPE". Under normal conditions, I don't suggest to set this to true.