Вы находитесь на странице: 1из 5

The right way to handle file

downloads in PHP
September 1, 2011/by Armand Niculescu

Ive seen many download scripts written in PHP, from simple one-liners to dedicated
classes. Yet, at least half of them share common errors; in many cases programmers
simply copy the code from something that works, without even attempting to
understand what it really does.
What follows is not a complete working download script, but rather a set of issues you
should be aware about and that will allow you to write better code.

1. Never accept paths as input


Its very tempting to write something like
readfile($_GET['file']);

but before you do, think about it: anyone could request any file on the server, even if
its outside the public html area. Guessing is not too difficult and in a few tries, an
attacker could obtain configuration or password files.
You might think youre being extra clever by doing something like
$mypath = '/mysecretpath/' .

$_GET['file'];

but an attacker can use relative paths to evade that.


What you must do always is sanitize the input. Accept only file names, like this:
$path_parts = pathinfo($_GET['file']);
$file_name = $path_parts['basename'];
$file_path = '/mysecretpath/' . $file_name;

And work only with the file name and add the path to it youserlf.
Even better would be to accept only numeric IDs and get the file path and name from a
database (or even a text file or key=>value array if its something that doesnt change
often). Anything is better than blindly accept requests.
If you need to restrict access to a file, you should generate encrypted, one-time IDs, so
you can be sure a generated path can be used only once.

2. Use headers correctly


This is a very widespread problem and unfortunately even the PHP manual is plagued
with errors. Developers usually say this works for me and they copy stuff they dont
fully understand.
First of all, I notice the use of headers like Content-Description and Content-TransferEncoding. There is no such thing in HTTP. Dont believe me? Have a look at RFC2616,

they specifically state HTTP, unlike MIME, does not use Content-Transfer-Encoding, and
does use Transfer-Encoding and Content-Encoding. You may add those headers if you
want, but they do absolutely nothing. Sadly, this wrong example is present even in the
PHP manual.
Second, regarding the MIME-type, I often see things like Content-Type: application/forcedownload. Theres no such thing and Content-Type: application/octet-stream (RFC1521)

would work just as fine (or maybe application/x-msdownload if its an exe/dll). If youre
thinking about Internet Explorer, its even better to specify it clearly rather than force it
to sniff the content. See MIME Type Detection in Internet Explorer for details.
Even worse, I see these kinds of statements:
header("Content-Type: application/force-download");
header("Content-Type: application/octet-stream");
header("Content-Type: application/download");

The author must have been really frustrated and added three Content-Type headers.
The only problem is, as specified in the header() manual entry, The optional replace
parameter indicates whether the header should replace a previous similar header, or
add a second header of the same type. By default it will replace. So unless you
specify header("Content-Type: some-value", FALSE), the new Content-Typeheader will
replace the old one.

3. Forcing download and Internet Explorer


bugs
What would it be like to not having to worry about old versions of Internet Explorer? A
better world, thats for sure.
To force a file to download, the correct way is:

header("Content-Disposition: attachment; filename=\"$file_name\"");

Note: the quotes in the filename are required in case the file may contain spaces.
The code above will fail in IE6 unless the following are added:
header("Pragma: public");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");

Now, the use of Cache-Control is wrong in this case, especially to both values set to
zero, according to Microsoft, but it works in IE6 and IE7 and later ignores it so no harm
done.
If you still get strange results when downloading (especially in IE), make sure that the
PHP output compression is disabled, as well as any server compression (sometimes the
server inadvertently applies compression on the output produced by the PHP script).

4. Handling large file sizes


readfile() is a simple way to ouput files files. Historically it had some performance issues

and while the documentation claims there are no memory problems, real-life scenarios
beg to differ output buffering and other subtle things. Regardless, if you need byte
ranges support, you still have to output the old-fashioned way.
The simplest way to handle this is to output the file in chunks:
set_time_limit(0);
$file = @fopen($file_path,"rb");
while(!feof($file))
{
print(@fread($file, 1024*8));
ob_flush();
flush();
}

If youre on Apache, theres a very cool module called mod_xsendfile that makes the
download simpler and faster. You just output a header and the module takes care of the
rest. Of course, you must be able to install it and it also makes the code less portable so
you probably wont want to use this for redistributable code.

5. Disable Gzip / output compression /


output buffering

This is the source of many seemingly obscure errors. If you have output buffering, the
file will not be sent to the user in chunks but only at the end of the script. Secondly,
youre most likely to be outputting a binary file that does not need compression anyway.
Thirdly, some older browser+server combinations might become confused that youre
requesting a text file (PHP) but youre sending compressed data with a different content
type.
To avoid this, assuming youre using Apache, create a .htaccess file in the folder
containing your download script with this directive:
SetEnv no-gzip dont-vary

This will disable compression in that folder.

6. Resumable downloads
For large files, its useful to allow downloads to be resumed. Doing so is more involved,
but its really worth doing, especially if you serve large files or video/audio.
Im not going to write a complete example, but to point you in the right direction.
First, you need to signal the browser that you support ranges:
header("Accept-Ranges: bytes");

Again, Ive seen examples in which the actual byte range is given (e.g. 0-1000), which is
wrong, according to the specs.
At the start of your script, after checking the file (if it exists, etc.), you have to check if a
range is requested:
if (isset($_SERVER['HTTP_RANGE']))
$range = $_SERVER['HTTP_RANGE'];

Ranges can be expressed like bytes=-99 or bytes=0-99 for the first 100 bytes,
bytes=100- to skip the first 100 bytes, or bytes=1720-8392 for something in the
middle. Be aware that multiple ranges can be specified (e.g. 100-200,400-) but
processing and especially delivering those ranges is more complicated so no one
bothers.
So, now that you have the range, you have to make sure thats expressed in bytes, that
it does not contain multiple ranges and that the range itself is valid (end is greater that
the start, start is not negative, and end is not larger than the file itself. Note that
bytes:- is not a valid request. If the range is not valid, you must output
header('HTTP/1.1 416 Requested Range Not Satisfiable');

(yet again, many scripts get this wrong by sending 400 errors or other codes). Do not
try to guess or fix the range(s) as it may result in corrupted downloads, which are more
dangerous than failed ones.
Then, you must send a bunch of headers:
header('HTTP/1.1 206 Partial Content');
header('Accept-Ranges: bytes');
header("Content-Range: bytes $start-$end/$filesize");
$content_length = $end - $start + 1;
header("Content-Length: $length");

Every line contains a gotcha. Many developers forget to send the 206 code or
the Accept-Ranges. Dont forget that given a file size of 1000 bytes, a full range would
be 0-999 so the Content-Range would be expressed as Content-Range: bytes 0-999/1000.
Yet others forget that when you send a range, theContent-Length must match the length
of the range rather than the size of the whole file.
You can output the file using the method described above, skipping until the start of the
range and delivering the length of the range.

Closing thoughts
I did my best to provide only accurate information. It would be truly sad for me if an
article about avoiding common PHP errors contained errors itself.
Regardless, my point stands: PHP makes it easy to hack together code that appears to
be working, but developers should read and adhere to the official specifications.
UPDATE: I released a free script that adheres to the above guidelines.

Вам также может понравиться