Results 1 to 4 of 4

Thread: Process Large File

  1. #1

    Thread Starter
    Frenzied Member StrangerInBeijing's Avatar
    Join Date
    Mar 2005
    Location
    Not in Beijing
    Posts
    1,666

    Process Large File

    Hi,
    I got a huge file, that I want to open using php, then process line by line, and saving data to the database.
    Importing directly ot mySql is not an option, as I have to do some processing on each line, like converting the case, and seeing if and to which table data have to go.

    At the moment, I cant do this as the script gives an timeout error.
    Code:
    	$fd = fopen ($filename, "r");
    	// initialize a loop to go through each line of the file
    	while (!feof ($fd)) {
    		$buffer = fgetcsv($fd, 4096); // declare an array to hold all of the contents of each
    		//row, indexed
    		$country_code = $buffer[2];
    		$country = $buffer[3];
    		$region = $buffer[4];
    		$city = $buffer[4];
    		$country_code = $buffer[2];
    		//figure out what to do with the data and insert to database if not exist
    		//does the country exists?  if not, put it in country table
    		//does the region exists?  if not, put in region table
    		//does the city exists?  if not,put in city table		
    	}
    	fclose ($fd);

  2. #2

    Thread Starter
    Frenzied Member StrangerInBeijing's Avatar
    Join Date
    Mar 2005
    Location
    Not in Beijing
    Posts
    1,666

    Re: Process Large File

    I was thinking...
    Maybe I make that loop only run a certain number of times, and then exit.
    As I read and process a record, remove the record from the file.
    Is it possible to delete lines from a file you are reading?

  3. #3
    PowerPoster kfcSmitty's Avatar
    Join Date
    May 2005
    Posts
    2,248

    Re: Process Large File

    Maximum PHP script run limit is something like 30 seconds I believe.

    You could either do as you say (you would have to delete the file and re-write it I believe, you cannot remove lines from an open file [Not 100% sure]), or you could try to set a longer timeout.

    Here is a line that will allow your script to run longer:

    http://de3.php.net/set-time-limit

  4. #4
    VBA Nutter visualAd's Avatar
    Join Date
    Apr 2002
    Location
    Ickenham, UK
    Posts
    4,906

    Re: Process Large File

    You can open the file in read write mode and use fseek() to move to the location you want. Problem is like smitty said, you cannot remove the data - only shift all following lines up. Hugely inefficient.

    Alternate option:
    • As the file is too big you cannot store it in memory.
    • Open your source file and process it one line at a time or however many lines you need to process. Keep in mind that you cannot store the whole file in memory and therefore cannot deal with the whole thing at once.
    • As the processed data is spat out - put it in a temporary file. tmpname function will help you here.
    • Once done - close the source file and copy the temporary file in its place.

    It's just a glorified copy and paste.

    If it is a big file DO NOT do piggy back this on the back of a users request and do it as a cron job - no user is going to wait 10 years while you process your 2TB file and if you do it with a task scheduler there is no need for the above method, you can take as long as you like. In fact, I'd recommend you set its priority to low.

    But what you can do if your host doesn't give you access to a scheduler to fire the job off independently of a request? The solution is to piggy back off others requests but only for a fraction of a second.

    To do this:
    • You will need to process the file in portions, you decide the how much. To do this you will need to create a third file with a line / record number or offset in. You would then use this with fseek to go back to the location next time and pick up where left off.
    • You also need to ensure that no two processes open and process the source file simultaneously - it will cause all kinds of problems. You will be in effect using a type of threading and ever precaution you take when threading should be taken here. I suggest a lock file.

    Using the above method, the records will be updated and processed at regular intervals. As you are doing it in bite sized chunks you will not need to worry about memory and it is by far the most efficient way of carrying out any kind of batch processing if your host does not allow you to setup scheduled jobs (that said - if they don't, ditch them ASAP).
    PHP || MySql || Apache || Get Firefox || OpenOffice.org || Click || Slap ILMV || 1337 c0d || GotoMyPc For FREE! Part 1, Part 2

    | PHP Session --> Database Handler * Custom Error Handler * Installing PHP * HTML Form Handler * PHP 5 OOP * Using XML * Ajax * Xslt | VB6 Winsock - HTTP POST / GET * Winsock - HTTP File Upload

    Latest quote: crptcblade - VB6 executables can't be decompiled, only disassembled. And the disassembled code is even less useful than I am.

    Random VisualAd: Blog - Latest Post: When the Internet becomes Electricity!!


    Spread happiness and joy. Rate good posts.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width