On Sun, 2011-12-18 at 06:17 -0300, Fernando Cassia wrote:
On Sun, Dec 18, 2011 at 04:14, Joe Zeff <joe(a)zeff.us> wrote:
> Basically, the system tries to find a place big enough to hold the entire
> file instead of putting the first chunk into the first place it finds.
Please confirm if I understood this right, as I¨m not familiar with
the low-level APIs involved with file creation.
No offense, but my advice to anyone playing around with Linux or Unix
systems at a level beyond the end-user is to read up on the basics of
the core file APIs, even if you're never going to write a program.
Understanding how the file abstraction works is to my mind a question of
basic culture around here. As a piece of engineering design balancing
functional elegance with practicality it's a wonderful thing to
contemplate and a key element of the success of the Unix model,
especially when you compare it to the competition (of course by now the
competition has essentially lifted the best parts but it wasn't always
that way).
You should at least look at creat(2), open(2), lseek(2), read(2),
write(2) and unlink(2).
Anyway ...
Is there a way to tell
the file system that you´re creating a file with total size "x" before
any such data is written to it, I mean, as part of the file creation
call?.
No.
I mean, it is one thing to create a file with size 0, then start
appending data to it in chunk, than say "hey, I´m creating a
8-gigabytes long file, with name xyz". If the latter exists, I´m
curious if there´s logic at the filesystem level to try to find a
chunk of free space big enough to allocate it (to reduce
fragmentation).
Some filesystem implementations may allow this and some not, but the
basic APIs are a lowest common denominator and don't include it
directly. Whether or not you preallocate space has no effect on the
semantics of accessing the file, so it's an optimization issue and
different implementations may do it in different ways, e.g. some may use
extents -- so they always preallocate a certain minimum amount -- but
not all do.
Is that what you are saying?.
I do know that for instance some bittorrent clients (Vuze -formerly
azureus comes to mind) allocate the full size of the file being
retrieved, then starts populating (writing) segments as those are
downloaded, but I never knew if the file creation call was a single
one or it actually consisted of the file creation call first, and then
a write of the x gigabytes of zeroes...
The BT clients do this not for speed optimization (irrelevant for this
use case) but as a way of reserving space. That way there's no danger of
running out of room in the middle of a large BT transfer.
I can´t believe that in this day and age (i briefly looked at the
win32 api and it seems there´s no api to create a fixed-size empty
file) there´s no api for this, and that one has to rely on a per-app
implementation (ie filing zeroes).
Believe it. However don't think that allocation is done by writing
zeroes; in some implementations writing a block- or extent-aligned
buffer of zeroes won't actually send any data to the disk.
Also, take a look at fallocate(2), but note that it's Linux-specific.
Why am I asking this? because of this lament about the lack of a
"mkfile" command in Linux as there is in Solaris
http://madbodger.livejournal.com/114433.html
Mkfile is a *command*, i.e. a program written using the API. You could
just as easily write mkfile in Linux (maybe someone has done it, I don't
know).
Just curious... (I know, you will tell me "it isn´t the job of
a
filesystem to populate the contents of an empty file!). And maybe
you´d be right. Still, I wonder if perhaps fixed-size, empty-file
creation wouldn´t be much faster if it was implemented at the
filesystem level).
Consider the following program:
main()
{
int fd;
fd=creat("myfile", 0666);
lseek (fd, 100000, 2);
write (fd, "end", 4);
}
Save as (say) hole.c and do:
$ make hole
$ ./hole
$ ls -l myfile
$ du myfile
$ cat myfile
Now see if you understand what's happening.
poc