Results 1 to 10 of 10

Thread: String Functions

  1. #1

    Thread Starter
    Frenzied Member Technocrat's Avatar
    Join Date
    Jan 2000
    Location
    I live in the 1s and 0s of everyones data streams
    Posts
    1,024

    String Functions

    As part of another project I am working with I made the following string functions:
    IsNumeric
    Upcase
    Lowercase
    Trim
    RTrim
    LTrim
    right
    left
    mid

    I was wondering if you guys could take the time to look at my code and let me know if it is optimized, and/or if there is a better way to do something. Then let me know so I can fix it. Included is a sample console program just to show how each function works.

    Thanks
    Attached Files Attached Files
    Last edited by Technocrat; Mar 25th, 2002 at 02:01 PM.
    MSVS 6, .NET & .NET 2003 Pro
    I HATE MSDN with .NET & .NET 2003!!!

    Check out my sites:
    http://www.filthyhands.com
    http://www.techno-coding.com


  2. #2
    jim mcnamara
    Guest
    As a suggestion, limit compares that are redundant, and don't use external string functions like strlen(). By using the ternary ? operator you cut down the comparison operations by close to 50%. I used the IsNumeric() function as an example.
    This assumes sz string as argument

    Code:
    #define _ISDIGIT 0x10
    BOOL IsNumeric(char *chString)
    {
    
          char *buf;
          buf = chString;
          while ((*buf & _ISDIGIT) ? 1 : (*buf =='.') )
         {
               buf++;
         }
          return  (BOOL) (*buf==0x00);	/*Return  - if we got to to the end then result is TRUE*/
    }

  3. #3

    Thread Starter
    Frenzied Member Technocrat's Avatar
    Join Date
    Jan 2000
    Location
    I live in the 1s and 0s of everyones data streams
    Posts
    1,024
    Actually about a week ago I was looking for how to do this exact thing. Where were you then?
    MSVS 6, .NET & .NET 2003 Pro
    I HATE MSDN with .NET & .NET 2003!!!

    Check out my sites:
    http://www.filthyhands.com
    http://www.techno-coding.com


  4. #4

    Thread Starter
    Frenzied Member Technocrat's Avatar
    Join Date
    Jan 2000
    Location
    I live in the 1s and 0s of everyones data streams
    Posts
    1,024
    I think I need to understand (*buf & _ISDIGIT) better. I see it works but I am not sure why. Are you masking the pointer here? Why are you masking it with 0x10 if you are? How does that come back if it is a true or false?
    Last edited by Technocrat; Mar 25th, 2002 at 03:26 PM.
    MSVS 6, .NET & .NET 2003 Pro
    I HATE MSDN with .NET & .NET 2003!!!

    Check out my sites:
    http://www.filthyhands.com
    http://www.techno-coding.com


  5. #5
    jim mcnamara
    Guest
    It isn't necessarily better. Sometimes more efficient code is hard to understand, which is definitely a bad thing.

    & _ISDIGIT returns a positive value for any char that is an ASCII character that is a number. *buf points to the VALUE of what buf is currently aimed at.

    In C, 0 is False, everything else is True. There is no BOOL value. BOOL is unsigned int, in other words, a number.

    What the code does:

    Checks if the code is a digit. If it fails the digit test, check for a
    '.'. Return the value of the test to control the loop.
    buf++ just moves to the next character. At string end the character is '\0' - the null character. It is not a number so it fails the test, you exit the loop. If you hit a non-number you exit the loop early.

    If you went all the way thru the string to the end, then
    return (*buf==0x00) is True. if not it's False because you exit the loop early.

    While this kind of stuff can be fun & instructive (maybe), you should'nt re-invent the wheel. All of this stuff has been done to death, so consider looking around for algorithms. Especially if you are getting money for your code.

    A really great place is the library - Check out Knuth's 'The Art of Computing'. About 40% of the questions in all the forums here were answered really well by this guy. In 1968.

  6. #6

    Thread Starter
    Frenzied Member Technocrat's Avatar
    Join Date
    Jan 2000
    Location
    I live in the 1s and 0s of everyones data streams
    Posts
    1,024
    Originally posted by jim mcnamara
    It isn't necessarily better. Sometimes more efficient code is hard to understand, which is definitely a bad thing.
    Yeah I found that out a long time ago. But this time I am really looking for the most efficient code I can.

    & _ISDIGIT returns a positive value for any char that is an ASCII character that is a number. *buf points to the VALUE of what buf is currently aimed at.

    In C, 0 is False, everything else is True. There is no BOOL value. BOOL is unsigned int, in other words, a number.

    What the code does:

    Checks if the code is a digit. If it fails the digit test, check for a
    '.'. Return the value of the test to control the loop.
    buf++ just moves to the next character. At string end the character is '\0' - the null character. It is not a number so it fails the test, you exit the loop. If you hit a non-number you exit the loop early.

    If you went all the way thru the string to the end, then
    return (*buf==0x00) is True. if not it's False because you exit the loop early.

    While this kind of stuff can be fun & instructive (maybe), you should'nt re-invent the wheel. All of this stuff has been done to death, so consider looking around for algorithms. Especially if you are getting money for your code.
    I figured that out with playing with the code. The only thing I really don't understand is what happens here (*buf & _ISDIGIT). What exactly happens at this point?

    A really great place is the library - Check out Knuth's 'The Art of Computing'. About 40% of the questions in all the forums here were answered really well by this guy. In 1968.
    Hmm I will check it out.

    Also thanks for you help & input.
    MSVS 6, .NET & .NET 2003 Pro
    I HATE MSDN with .NET & .NET 2003!!!

    Check out my sites:
    http://www.filthyhands.com
    http://www.techno-coding.com


  7. #7
    Fanatic Member MoMad's Avatar
    Join Date
    Oct 2000
    Location
    Seattle, WA
    Posts
    625
    There's a flaw (bug) in this code. Even though it impressed me so much, it is not quite accurate!!

    according to your code, this is a number:

    <=>:;

    This code is very good by the way. After studying it for a while, i figured out what was going on!

    _ISDIGIT ==> 0x10 ==> 16 ==> 0010 0000 ==> MASK

    so when you do (*buff & _ISDIGIT) you are asking if the 1st bit matches... since the mask is 16.

    Breakdown:

    Code:
      our mask is: 0010 0000
     and the numbers are on the 0x3? range...
    
     A BITWISE & operator compares 2 BITS and 
     if any one of the bits is 0, returns 0.
    
     EG:
    
         a: 1100  0011 ==> C3 ==> 195
         b: 0101  0111 ==> 57 ==> 87
        ==============
       ans: 0100  0011 ==> 43 ==> 67
    
    
    which equals: 67 as illustrated.
    
    Now with numbers, they are all in the range of HEX: 0x3? 
    but so are :;<=>? (colon, semi-colon, less than, equal, greater than, and question mark) 
    
      0 1 2 3 4 5 6 7 8 9 : ; < = > ? 
    
     so now when you get their actual bits (binary representation),
    
    you get:
    
      0 ==> 110000
      1 ==> 110001
      2 ==> 110010
      3 ==> 110100
      4 ==> 111000
      5 ==> 110101
      6 ==> 110110
      7 ==> 110111
      8 ==> 111000
      9 ==> 111001
      : ==> 111010
      ; ==> 111011
      < ==> 111100
      = ==> 111101
      > ==> 111110
      ? ==> 111111
    
    
    |--------------|
    |       11???? |
    | AND   100000 |
    |------------- |
    |       100000 |
    |--------------|
    
    
      Notice how ALL of the first bits are 1.  Now if you mask any of 
      these values with 10000 it will ALWAYS return 10000 which is
      0x10 and 16 respectively, and DEFINATELY not a zero value.
    Phew!! There goes my day. This was the best thing ive stumbled onto for a very long time, but unfortunately I cant use it in my code... I have a function that checks the IsNum like this:

    Code:
      while (((*buf >= '0') ? (*buff <= '9') : 0) && *buff == '.')
    Even though its quite redundant. But if your data doesnt contain any of those "valid number" characters, then the above formulae is the best.

    By the way, im working on an expressions evaluator that will convert a mathematical expression into a numeric value. And the first thing it does is make sure the string passed is numbers...

    Happy Coding

    Regards,
    MoMad
    Last edited by MoMad; Mar 29th, 2002 at 03:23 AM.
    :MoMad:
    Nice Sig!

    http://go.to/momad/ Status: Not Ready

  8. #8
    Fanatic Member MoMad's Avatar
    Join Date
    Oct 2000
    Location
    Seattle, WA
    Posts
    625
    For more info on BITWISE operators and ASCII values, look into:

    > http://www.cplusplus.com/doc/papers/ascii.html
    > http://www.cplusplus.com/doc/papers/boolean.html

    and NUMERICAL RADIXES (HEX, DEC, OCT, BIN)

    > http://www.cplusplus.com/doc/papers/hex.html

    Also, for everything else, use CALCULATOR (Programs > Accessories > Calc) and view it in SCIENTIFIC MODE ( View > Scientific ), then you can do all of the comparissons that way :P

    Thats all, have fun again!!

    Actually, have some more fun!!

    ALSO: http://www.google.com/

    and click on GROUPS to search newsgroups!!

    Regards,
    MoMad
    :MoMad:
    Nice Sig!

    http://go.to/momad/ Status: Not Ready

  9. #9
    Fanatic Member MoMad's Avatar
    Join Date
    Oct 2000
    Location
    Seattle, WA
    Posts
    625
    WOw Jim, Thats some real crazy stuff... but I cant believe they fill in a whole char array for the mere purpose of checking the values!!

    You see I have already thought up of that... and for a simple task of getting wether the string passed is all numbers... but, since i will already be doing if/else group for each sub group of the expression (ie, switch (currChar) { case '+', '-', '/', '*', '^')... then I really dont need to add overhead to my function.

    By the way, i notice that boreland are MACRO ppl and that HP is all functions!!

    ANyways, Macros are very cool if used right, as in anything else!!

    But see how boreland has to fill in an array of 256 chars to find out what the value of the arg is... its very cool idea and it would speed things up by thousands, but if you are making a dll, the dll gets called then destroyed after each call, so that step would slow things down.
    :MoMad:
    Nice Sig!

    http://go.to/momad/ Status: Not Ready

  10. #10
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    Who would do such an inefficient thing as destroying a dll after each call to it? And if he does, there's so much overhead that a 256 byte array doesn't matter.

    BTW "& 0x10" would also include:
    0x10 to 0x1F (unprintable)
    0x50 to 0x5F (P to _)
    0x70 to 0x7F (p to DEL)
    0x90 to 0x9F (unprintable)
    0xB0 to 0xBF (diacriticae)
    0xD0 to 0xDF ------||------
    0xF0 to 0xFF ------||------
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width