Results 1 to 8 of 8

Thread: [RESOLVED] MySQL / PHP Purge Script

  1. #1

    Thread Starter
    Hyperactive Member
    Join Date
    Jul 2008
    Posts
    268

    Resolved [RESOLVED] MySQL / PHP Purge Script

    Hey guys,

    I manage a very large service that has dozens of MySQL read/writes coming in every second - one of our tables now has over 5+ million rows (and about 500MB of data), and the structure has three columns:

    userid | data | timestamp

    "userid" is a unique identifier for the user responsible for the data. Some users have thousands and thousands of entries -- others have only a few.

    For those users with greater than 600 entries, I'd like to keep only their most recent 600 entries. Luckily we store a timestamp!

    So, from a programming standpoint my thought was that I would have to loop through every single row, identify "duplicate" userids, put them into an array with their respective data and timestamps, and eventually purge those arrays down to 600 rows only, etc..

    You can see that this would be pretty complicated, so I thought I would ask if any of you know of any good purging scripts / something that might help with this.

    As advanced as this is, it can't be uncommon.

    Thanks - let me know if you need any clarification!

    David

  2. #2
    Frenzied Member
    Join Date
    Apr 2009
    Location
    CA, USA
    Posts
    1,516

    Re: MySQL / PHP Purge Script

    For almost any table you make, you should have a unique row id column. Assuming you added one (called "rowid"), here are 2 MySQL methods that don't work:
    Code:
    DELETE FROM myTable WHERE rowid IN (SELECT rowid FROM myTable WHERE userid=5 ORDER BY timestamp DESC LIMIT 600,9999)
    What this ought to do in the subquery is retrieve the rowids of a certain userid (5) ordered by descending timestamp (so newest to oldest) and skipping the first 600 (the "9999" value is arbitrary). Then delete all of those rows. If you try to run this however, you'll get an error saying that MySQL doesn't [currently] permit a subquery with LIMIT to be used for IN.

    Code:
    DELETE FROM myTable WHERE userid=5 AND rowid <= (SELECT rowid FROM myTable WHERE userid=5 ORDER BY timestamp DESC LIMIT 600,1)
    In this one, the subquery would return just one rowid, of the 601st row belonging to userid 5. Any rowids equal to or less than that (and belonging to userid 5), should be deleted. But this won't work because [currently] you can't use a DELETE and a SELECT with the same table in one statement.

    In both cases, the problem is essentially having a subquery. So, if you separate them out, then you could use either approach. The second one is simpler:
    Code:
    $sql = "SELECT rowid FROM myTable WHERE userid=5 ORDER BY timestamp DESC LIMIT 600,1";
    $result = mysql_query($sql);
    $rowid = mysql_result($result,0);
    $sql = "DELETE FROM myTable WHERE userid=5 AND rowid <= ".$rowid;
    mysql_query($sql);
    I don't consider SQL my strong suit though, so I'd defer to others' advice, and possibly pose your question to the Database Development forum, as they might be able to suggest more efficient SQL.
    Last edited by SambaNeko; Feb 28th, 2010 at 01:35 AM.

  3. #3
    PowerPoster
    Join Date
    Sep 2003
    Location
    Edmonton, AB, Canada
    Posts
    2,629

    Re: MySQL / PHP Purge Script

    this doesn't sound very advanced at all (and I'm really surprised if your table literally only has those three columns; you don't even have a primary key?). anyhow, I couldn't think of any pure MySQL solutions that didn't require an actual batch SQL script, so here's a hacky-feeling PHP and MySQL solution.

    first, I'll explain my testing bed. I created a script that inserted 1000 rows into a table which all had a random timestamp (generated by rand(10000000000, 99999999999), so it's a unix timestamp) and a random userid (from 1 to 3, created so that at least one of those userids would have substantially less entries). then, I ran this query:

    Code:
    SELECT userid, COUNT(*) FROM `purge` GROUP BY userid;
    the result was the following:
    Code:
    +--------+---------+
    | userid | COUNT(*)| 
    +--------+---------+
    | 1      | 456     | 
    | 2      | 151     |
    | 3      | 304     |
    +--------+---------+
    (yes, I realize that those numbers only add up to 700 or so. I didn't turn off the default 30 second execution limit)

    now, I created this script:
    PHP Code:
    <?php
      
    //array to store user IDs
      
    $users = array();
      
      
    //the cut off point
      
    $cutoff 250;
      
      
    //this query will select a unique userid and the 251st timestamp from this user
      
    $sql "SELECT DISTINCT userid, (SELECT t.timestamp FROM `purge` t WHERE t.userid=p.userid ORDER BY t.timestamp DESC LIMIT {$cutoff}, 1) cutoff FROM `purge` p;";
      
    $u_query mysql_query($sql);
      while(
    $r_query mysql_fetch_assoc($u_query)){
      
        
    //if this user does not have over 250 records, cutoff will be null
        
    if(isset($r_query['cutoff'])){
          
    $users[$r_query['userid']] = $r_query['cutoff'];
        }
      
      }
      
      
    //loop through the users array and delete anything older than the timestamp we found
      
    foreach($users as $user => $timestamp){
      
        
    $query "DELETE FROM `purge` WHERE userid={$user} AND timestamp<={$timestamp}";
        
    mysql_query($query) or die(mysql_error());
      
      }
    ?>
    in my case, I wanted to trim everything down to 250 ($cutoff). after running this script, I re-ran the counting query:
    Code:
    +--------+---------+
    | userid | COUNT(*)| 
    +--------+---------+
    | 1      | 250     | 
    | 2      | 151     |
    | 3      | 250     |
    +--------+---------+
    putting the execution of this little script into a cron job that ran every night or every week (or even every month) would be pretty simple.

    hope that helps.

    in any case, you should consider re-designing your table.

  4. #4
    PowerPoster
    Join Date
    Sep 2003
    Location
    Edmonton, AB, Canada
    Posts
    2,629

    Re: MySQL / PHP Purge Script

    oh, and hi Samba ;)

  5. #5

    Thread Starter
    Hyperactive Member
    Join Date
    Jul 2008
    Posts
    268

    Re: MySQL / PHP Purge Script

    Fantastic - thanks.

    The table has a primary key, as well as quite a few other columns - I just listed the ones that this issue concerned - sorry about the confusion!

    D

  6. #6
    I'm about to be a PowerPoster!
    Join Date
    Jan 2005
    Location
    Everywhere
    Posts
    13,647

    Re: [RESOLVED] MySQL / PHP Purge Script

    I don't agree with the comment that tables should have an ID column by default — it may very often be the case, but one should also consider whether the data itself has a natural primary key. Sometimes one or more columns will necessarily be unique and other times a combination of columns form the (composite) primary key.

  7. #7
    Frenzied Member I_Love_My_Vans's Avatar
    Join Date
    Jan 2005
    Location
    In the PHP compiler
    Posts
    1,275

    Re: [RESOLVED] MySQL / PHP Purge Script

    +1 pena

  8. #8
    Frenzied Member
    Join Date
    Apr 2009
    Location
    CA, USA
    Posts
    1,516

    Re: [RESOLVED] MySQL / PHP Purge Script

    Quote Originally Posted by penagate View Post
    I don't agree with the comment that tables should have an ID column by default — it may very often be the case, but one should also consider whether the data itself has a natural primary key.
    Indeed. Hence I say, "For almost any table you make..." Not always, but more often than not.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width