Results 1 to 1 of 1

Thread: Confused on cache hits/ CPI when analyzing nested for loops

Threaded View

  1. #1

    Thread Starter
    Hyperactive Member voidflux's Avatar
    Join Date
    Jun 2003
    Location
    Brockway, PA
    Posts
    290

    Arrow Confused on cache hits/ CPI when analyzing nested for loops

    Hello everyone this isn't exactly asm but it does fall into the category. ANy help would be great here is my problem:

    I had the following file that multiplies 2 matrices together, with 100 elements and puts it into a 3rd 100 element matrix.

    Well I used Simple Scalar to anlayize the CPI generated by the code and it also records the miss rate on the level 1 cachee.

    But i'm confused on my findings...Here is what I found:

    Code:
    /* This is a program for a trial on super
       scalar simulator , it multiples to matrices */
    #include <stdio.h>
    
    #define N 100
    
    void main()
    {
      int a1[N][N], a2[N][N], a3[N][N], i, j, k;
      for(i =0; i<N; i++)
        for(j =0; j<N; j++)
          {
    	a1[i][j]= /*1;*/ i * j + 1;
    	a2[i][j]= /*1;*/ i + j + 1;
    	a3[i][j]= 0;
          }
    
       printf( "\nThe first matrix !!!\n");
    
       for(i =0; i<N; i++)
       {
         for(j =0; j<N; j++)
          {
    	printf("\t %d", a1[i][j]);
          }
         printf("\n");
       }
    
       printf( "\nThe second matrix !!!\n");
    
       for(i =0; i<N; i++)
       {
         for(j =0; j<N; j++)
          {
    	printf("\t %d", a2[i][j]);
          }
         printf("\n");
       }
    
       for(i =0; i<N; i++)
         for(j =0; j<N; j++)
            for(k =0; k<N; k++)
    	  a3[i][j] += a1[i][k] * a2[k][j];
    
    
       
       printf( "\nThe result matrix !!!\n");
    
       for(i =0; i<N; i++)
       {
         for(j =0; j<N; j++)
          {
    	printf("\t %d", a3[i][j]);
          }
         printf("\n");
       }

    Code:
    Runs:	         mxm.c	 mxm2.c	  mxm3.c
    Loop Order:	I,j,k	   I,k,j	K,I,j
    Sim_cpi:	0.5832	 0.5832	    0.5799
    ill.miss_rate	 0.0097	  0.0097     0.0097
    dl1.miss_rate  0.0047	0.0047	   0.0050
    ul2.miss_rate  0.0013	0.0013   0.0012
    Okay the Sim_CPI is the #of instructions per cycle
    il1.miss_rate is the instruction level 1 miss rate
    dl1.miss_rate is the data level 1 miss rate
    ul2.miss_rate i'm not quite sure what this is, i'm using a program called Simple Scalar and its not in the documentation


    But what i'm confused is, i know the greater the CPI the worse
    performance. But i'm confused on why the ul2.miss_rate is lower on the
    k,i,j order but the dl1.miss_rate is higher, but the CPI is still lower
    out of all of them. I'm also confused on why the I,j,k and the I,K,J
    are both the same with no change in CPI.

    Any help would be great!


    I was thinking it had to do with the way of ordering the loops has more
    cache hits than the other. i.e. a cache miss means waiting longer for
    the data to be fetched from the next level in the memory hierarchy

    But why?
    why the i,j,k and i,k,j are identical while k,i,j is more efficient.

    Thanks!
    Last edited by voidflux; Feb 9th, 2007 at 04:24 PM.
    C¤ry Sanchez
    Computer Science/Engineering
    @ Penn State
    IBM.zSeries Intern
    Mandriva 2007

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width