Some wiki URL can't read by fopen() in PHP.
■How to do it well ?
Use "ini_set('user_agent', '...');" just before "fopen()".
■Problem
On "Foreign Mom Next for iPad iPhone PC"(http://www.mylovewill.com/eng/index.php?LANG=en) ,
translate "http://simple.wikipedia.org/wiki/Mathematics"
and click the link, "Fermat's_last_theorem".
It shows an ERROR.
"Warning: fopen(http://simple.wikipedia.org/wiki/Fermat%27s_last_theorem)
[function.fopen]: failed to open stream: HTTP request failed!
HTTP/1.0 403 Forbidden in /home2/mylovewi/public_html/eng/eng.php on line 152"
But, the link, "Pythagorean theorem", goes well.
■Investigation
Thinking the special character, ('), is the reason.
I tried URLdecode() URLencode() functions, but, I was not able to be improved.
I made test program A for investigation with the other server.
I tested and analyzed packets by tcpdump.
■test program A
# cat test.php
<?php
$f = fopen("http://simple.wikipedia.org/wiki/Fermat%27s_last_theorem", rb);
if($f){
while(!feof($f)){
echo fread($f, 8192);
}
fclose($f);
}
?>
#
■log by tcpdump
# tcpdump -X -s 1600 port http
17:57:33.364559 IP 38.99.127.150.43966 > rr.pmtpa.wikimedia.org.http: S 1139268898:1139268898(0) win 5840 <mss 1460,sackOK,timestamp 1023836321 0,nop,wscale 7>
0x0000: 4500 003c 5d83 4000 4006 ceec 2663 7f96 E..<].@.@...&c..
0x0010: d050 9802 abbe 0050 43e7 dd22 0000 0000 .P.....PC.."....
0x0020: a002 16d0 9823 0000 0204 05b4 0402 080a .....#..........
0x0030: 3d06 80a1 0000 0000 0103 0307 =...........
17:57:33.446400 IP rr.pmtpa.wikimedia.org.http > 38.99.127.150.43966: S 3396924998:3396924998(0) ack 1139268899 win 5792 <mss 1460,sackOK,timestamp 630610046 1023836321,nop,wscale 9>
0x0000: 4500 003c 0000 4000 3406 3870 d050 9802 E..<..@.4.8p.P..
0x0010: 2663 7f96 0050 abbe ca78 f646 43e7 dd23 &c...P...x.FC..#
0x0020: a012 16a0 596c 0000 0204 05b4 0402 080a ....Yl..........
0x0030: 2596 587e 3d06 80a1 0103 0309 %.X~=.......
17:57:33.446422 IP 38.99.127.150.43966 > rr.pmtpa.wikimedia.org.http: . ack 1 win 46 <nop,nop,timestamp 1023836403 630610046>
0x0000: 4500 0034 5d84 4000 4006 cef3 2663 7f96 E..4].@.@...&c..
0x0010: d050 9802 abbe 0050 43e7 dd23 ca78 f647 .P.....PC..#.x.G
0x0020: 8010 002e 9e5a 0000 0101 080a 3d06 80f3 .....Z......=...
0x0030: 2596 587e %.X~
17:57:33.446444 IP 38.99.127.150.43966 > rr.pmtpa.wikimedia.org.http: P 1:45(44) ack 1 win 46 <nop,nop,timestamp 1023836403 630610046>
0x0000: 4500 0060 5d85 4000 4006 cec6 2663 7f96 E..`].@.@...&c..
0x0010: d050 9802 abbe 0050 43e7 dd23 ca78 f647 .P.....PC..#.x.G
0x0020: 8018 002e 9415 0000 0101 080a 3d06 80f3 ............=...
0x0030: 2596 587e 4745 5420 2f77 696b 692f 4665 %.X~GET./wiki/Fe
0x0040: 726d 6174 2532 3773 5f6c 6173 745f 7468 rmat%27s_last_th
0x0050: 656f 7265 6d20 4854 5450 2f31 2e30 0d0a eorem.HTTP/1.0..
17:57:33.528353 IP rr.pmtpa.wikimedia.org.http > 38.99.127.150.43966: . ack 45 win 12 <nop,nop,timestamp 630610054 1023836403>
0x0000: 4500 0034 aec1 4000 3406 89b6 d050 9802 E..4..@.4....P..
0x0010: 2663 7f96 0050 abbe ca78 f647 43e7 dd4f &c...P...x.GC..O
0x0020: 8010 000c 9e48 0000 0101 080a 2596 5886 .....H......%.X.
0x0030: 3d06 80f3 =...
17:57:33.528373 IP 38.99.127.150.43966 > rr.pmtpa.wikimedia.org.http: P 45:75(30) ack 1 win 46 <nop,nop,timestamp 1023836485 630610054>
0x0000: 4500 0052 5d86 4000 4006 ced3 2663 7f96 E..R].@.@...&c..
0x0010: d050 9802 abbe 0050 43e7 dd4f ca78 f647 .P.....PC..O.x.G
0x0020: 8018 002e cf67 0000 0101 080a 3d06 8145 .....g......=..E
0x0030: 2596 5886 486f 7374 3a20 7369 6d70 6c65 %.X.Host:.simple
0x0040: 2e77 696b 6970 6564 6961 2e6f 7267 0d0a .wikipedia.org..
0x0050: 0d0a ..
17:57:33.610301 IP rr.pmtpa.wikimedia.org.http > 38.99.127.150.43966: . ack 75 win 12 <nop,nop,timestamp 630610062 1023836485>
0x0000: 4500 0034 aec2 4000 3406 89b5 d050 9802 E..4..@.4....P..
0x0010: 2663 7f96 0050 abbe ca78 f647 43e7 dd6d &c...P...x.GC..m
0x0020: 8010 000c 9dd0 0000 0101 080a 2596 588e ............%.X.
0x0030: 3d06 8145 =..E
17:57:33.655274 IP rr.pmtpa.wikimedia.org.http > 38.99.127.150.43966: P 1:410(409) ack 75 win 12 <nop,nop,timestamp 630610067 1023836485>
0x0000: 4500 01cd aec3 4000 3406 881b d050 9802 E.....@.4....P..
0x0010: 2663 7f96 0050 abbe ca78 f647 43e7 dd6d &c...P...x.GC..m
0x0020: 8018 000c 3ba7 0000 0101 080a 2596 5893 ....;.......%.X.
0x0030: 3d06 8145 4854 5450 2f31 2e30 2034 3033 =..EHTTP/1.0.403
0x0040: 2046 6f72 6269 6464 656e 0d0a 4461 7465 .Forbidden..Date
0x0050: 3a20 5468 752c 2031 3620 5365 7020 3230 :.Thu,.16.Sep.20
0x0060: 3130 2030 303a 3537 3a33 3320 474d 540d 10.00:57:33.GMT.
0x0070: 0a53 6572 7665 723a 2041 7061 6368 650d .Server:.Apache.
0x0080: 0a43 6163 6865 2d43 6f6e 7472 6f6c 3a20 .Cache-Control:.
0x0090: 7072 6976 6174 652c 2073 2d6d 6178 6167 private,.s-maxag
0x00a0: 653d 302c 206d 6178 2d61 6765 3d30 2c20 e=0,.max-age=0,.
0x00b0: 6d75 7374 2d72 6576 616c 6964 6174 650d must-revalidate.
0x00c0: 0a56 6172 793a 2041 6363 6570 742d 456e .Vary:.Accept-En
0x00d0: 636f 6469 6e67 0d0a 436f 6e74 656e 742d coding..Content-
0x00e0: 4c65 6e67 7468 3a20 3132 300d 0a43 6f6e Length:.120..Con
0x00f0: 7465 6e74 2d54 7970 653a 2074 6578 742f tent-Type:.text/
0x0100: 6874 6d6c 0d0a 582d 4361 6368 653a 204d html..X-Cache:.M
0x0110: 4953 5320 6672 6f6d 2073 7137 362e 7769 ISS.from.sq76.wi
0x0120: 6b69 6d65 6469 612e 6f72 670d 0a58 2d43 kimedia.org..X-C
0x0130: 6163 6865 2d4c 6f6f 6b75 703a 204d 4953 ache-Lookup:.MIS
0x0140: 5320 6672 6f6d 2073 7137 362e 7769 6b69 S.from.sq76.wiki
0x0150: 6d65 6469 612e 6f72 673a 3331 3238 0d0a media.org:3128..
0x0160: 582d 4361 6368 653a 204d 4953 5320 6672 X-Cache:.MISS.fr
0x0170: 6f6d 2073 7137 372e 7769 6b69 6d65 6469 om.sq77.wikimedi
0x0180: 612e 6f72 670d 0a58 2d43 6163 6865 2d4c a.org..X-Cache-L
0x0190: 6f6f 6b75 703a 204d 4953 5320 6672 6f6d ookup:.MISS.from
0x01a0: 2073 7137 372e 7769 6b69 6d65 6469 612e .sq77.wikimedia.
0x01b0: 6f72 673a 3830 0d0a 436f 6e6e 6563 7469 org:80..Connecti
0x01c0: 6f6e 3a20 636c 6f73 650d 0a0d 0a on:.close....
17:57:33.655289 IP 38.99.127.150.43966 > rr.pmtpa.wikimedia.org.http: . ack 410 win 54 <nop,nop,timestamp 1023836612 630610067>
0x0000: 4500 0034 5d87 4000 4006 cef0 2663 7f96 E..4].@.@...&c..
0x0010: d050 9802 abbe 0050 43e7 dd6d ca78 f7e0 .P.....PC..m.x..
0x0020: 8010 0036 9b89 0000 0101 080a 3d06 81c4 ...6........=...
0x0030: 2596 5893 %.X.
17:57:33.655345 IP 38.99.127.150.43966 > rr.pmtpa.wikimedia.org.http: F 75:75(0) ack 410 win 54 <nop,nop,timestamp 1023836612 630610067>
0x0000: 4500 0034 5d88 4000 4006 ceef 2663 7f96 E..4].@.@...&c..
0x0010: d050 9802 abbe 0050 43e7 dd6d ca78 f7e0 .P.....PC..m.x..
0x0020: 8011 0036 9b88 0000 0101 080a 3d06 81c4 ...6........=...
0x0030: 2596 5893 %.X.
17:57:33.660328 IP rr.pmtpa.wikimedia.org.http > 38.99.127.150.43966: P 410:530(120) ack 75 win 12 <nop,nop,timestamp 630610067 1023836485>
0x0000: 4500 00ac aec4 4000 3406 893b d050 9802 E.....@.4..;.P..
0x0010: 2663 7f96 0050 abbe ca78 f7e0 43e7 dd6d &c...P...x..C..m
0x0020: 8018 000c 0c02 0000 0101 080a 2596 5893 ............%.X.
0x0030: 3d06 8145 5363 7269 7074 7320 7368 6f75 =..EScripts.shou
0x0040: 6c64 2075 7365 2061 6e20 696e 666f 726d ld.use.an.inform
0x0050: 6174 6976 6520 5573 6572 2d41 6765 6e74 ative.User-Agent
0x0060: 2073 7472 696e 6720 7769 7468 2063 6f6e .string.with.con
0x0070: 7461 6374 2069 6e66 6f72 6d61 7469 6f6e tact.information
0x0080: 2c20 6f72 2074 6865 7920 6d61 7920 6265 ,.or.they.may.be
0x0090: 2049 502d 626c 6f63 6b65 6420 7769 7468 .IP-blocked.with
0x00a0: 6f75 7420 6e6f 7469 6365 2e0a out.notice..
17:57:33.660379 IP 38.99.127.150.43966 > rr.pmtpa.wikimedia.org.http: R 1139268973:1139268973(0) win 0
0x0000: 4500 0028 0000 4000 4006 2c84 2663 7f96 E..(..@.@.,.&c..
0x0010: d050 9802 abbe 0050 43e7 dd6d 0000 0000 .P.....PC..m....
0x0020: 5004 0000 d430 0000 P....0..
17:57:33.660336 IP rr.pmtpa.wikimedia.org.http > 38.99.127.150.43966: F 530:530(0) ack 75 win 12 <nop,nop,timestamp 630610067 1023836485>
0x0000: 4500 0034 aec5 4000 3406 89b2 d050 9802 E..4..@.4....P..
0x0010: 2663 7f96 0050 abbe ca78 f858 43e7 dd6d &c...P...x.XC..m
0x0020: 8011 000c 9bb9 0000 0101 080a 2596 5893 ............%.X.
0x0030: 3d06 8145 =..E
17:57:33.660392 IP 38.99.127.150.43966 > rr.pmtpa.wikimedia.org.http: R 1139268973:1139268973(0) win 0
0x0000: 4500 0028 0000 4000 4006 2c84 2663 7f96 E..(..@.@.,.&c..
0x0010: d050 9802 abbe 0050 43e7 dd6d 0000 0000 .P.....PC..m....
0x0020: 5004 0000 d430 0000 P....0..
17:57:33.737229 IP rr.pmtpa.wikimedia.org.http > 38.99.127.150.43966: . ack 76 win 12 <nop,nop,timestamp 630610075 1023836612>
0x0000: 4500 0034 aec6 4000 3406 89b1 d050 9802 E..4..@.4....P..
0x0010: 2663 7f96 0050 abbe ca78 f859 43e7 dd6e &c...P...x.YC..n
0x0020: 8010 000c 9b31 0000 0101 080a 2596 589b .....1......%.X.
0x0030: 3d06 81c4 =...
17:57:33.737247 IP 38.99.127.150.43966 > rr.pmtpa.wikimedia.org.http: R 1139268974:1139268974(0) win 0
0x0000: 4500 0028 0000 4000 4006 2c84 2663 7f96 E..(..@.@.,.&c..
0x0010: d050 9802 abbe 0050 43e7 dd6e 0000 0000 .P.....PC..n....
0x0020: 5004 0000 d42f 0000 P..../..
■Log analysis and measures plan
http request are two packets, timestump ,
17:57:33.446444 17:57:33.528373.
In fopen(), there are only GET and Host:.
And the response status is "HTTP/1.0.403 Forbidden".
More the response bosy is
"Scripts should use an informative User-Agent string with
contact information, or they may be IP-blocked without notice".
I made test program B for investigation.
It goes well.
■test program B
# cat test.php
<?php
ini_set('user_agent', "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
$f = fopen("http://simple.wikipedia.org/wiki/Fermat%27s_last_theorem", rb);
if($f){
while(!feof($f)){
echo fread($f, 8192);
}
fclose($f);
}
?>
#