|
-
Dec 18th, 2007, 03:39 AM
#1
Thread Starter
Frenzied Member
Process Large File
Hi,
I got a huge file, that I want to open using php, then process line by line, and saving data to the database.
Importing directly ot mySql is not an option, as I have to do some processing on each line, like converting the case, and seeing if and to which table data have to go.
At the moment, I cant do this as the script gives an timeout error.
Code:
$fd = fopen ($filename, "r");
// initialize a loop to go through each line of the file
while (!feof ($fd)) {
$buffer = fgetcsv($fd, 4096); // declare an array to hold all of the contents of each
//row, indexed
$country_code = $buffer[2];
$country = $buffer[3];
$region = $buffer[4];
$city = $buffer[4];
$country_code = $buffer[2];
//figure out what to do with the data and insert to database if not exist
//does the country exists? if not, put it in country table
//does the region exists? if not, put in region table
//does the city exists? if not,put in city table
}
fclose ($fd);
-
Dec 18th, 2007, 09:40 AM
#2
Thread Starter
Frenzied Member
Re: Process Large File
I was thinking...
Maybe I make that loop only run a certain number of times, and then exit.
As I read and process a record, remove the record from the file.
Is it possible to delete lines from a file you are reading?
-
Dec 18th, 2007, 12:48 PM
#3
Re: Process Large File
Maximum PHP script run limit is something like 30 seconds I believe.
You could either do as you say (you would have to delete the file and re-write it I believe, you cannot remove lines from an open file [Not 100% sure]), or you could try to set a longer timeout.
Here is a line that will allow your script to run longer:
http://de3.php.net/set-time-limit
-
Dec 18th, 2007, 06:08 PM
#4
Re: Process Large File
You can open the file in read write mode and use fseek() to move to the location you want. Problem is like smitty said, you cannot remove the data - only shift all following lines up. Hugely inefficient.
Alternate option:
- As the file is too big you cannot store it in memory.
- Open your source file and process it one line at a time or however many lines you need to process. Keep in mind that you cannot store the whole file in memory and therefore cannot deal with the whole thing at once.
- As the processed data is spat out - put it in a temporary file. tmpname function will help you here.
- Once done - close the source file and copy the temporary file in its place.
It's just a glorified copy and paste. 
If it is a big file DO NOT do piggy back this on the back of a users request and do it as a cron job - no user is going to wait 10 years while you process your 2TB file and if you do it with a task scheduler there is no need for the above method, you can take as long as you like. In fact, I'd recommend you set its priority to low.
But what you can do if your host doesn't give you access to a scheduler to fire the job off independently of a request? The solution is to piggy back off others requests but only for a fraction of a second.
To do this:
- You will need to process the file in portions, you decide the how much. To do this you will need to create a third file with a line / record number or offset in. You would then use this with fseek to go back to the location next time and pick up where left off.
- You also need to ensure that no two processes open and process the source file simultaneously - it will cause all kinds of problems. You will be in effect using a type of threading and ever precaution you take when threading should be taken here. I suggest a lock file.
Using the above method, the records will be updated and processed at regular intervals. As you are doing it in bite sized chunks you will not need to worry about memory and it is by far the most efficient way of carrying out any kind of batch processing if your host does not allow you to setup scheduled jobs (that said - if they don't, ditch them ASAP).
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|