Results 1 to 14 of 14

Thread: String Library Design Question

  1. #1

    Thread Starter
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594

    String Library Design Question

    I've asked this question in two other places, but have received very limited feedback, so I'm extending my audience.
    Attend, for my problem is not easy to understand.

    It is about the CornedString library which I'm going to write.

    Preface
    Examining some large open source projects such as Mozilla shows that often, programmers want very specfic storage semantics for their string class. They may want, for example, employ a copy-on-write technique for sharing values between string objects. Or they might want their string objects to have a small internal buffer to avoid heap allocations for small objects. There are very many strategies in implementing strings, and all have advantages and drawbacks.
    The std::string class does not have specific requirements on its storage semantics. Library writers are free to implement it however they want. The library that comes with MSVC++.Net 2003, for example, employs a 64-bit internal buffer for short strings. The GNU C++ library uses reference counted strings with copy-on-write. One drawback of that approach is that it is not thread-safe.
    As a result, projects repeatedly implement their own specific string classes. This has quite a few drawbacks:
    1) Extended development time for additional coding, testing and debugging.
    2) Incompatible string classes between projects make transferring code hard.
    3) Library solutions are more reliable because more than one group does the testing.
    4) A string class written as part of a larger program is often less efficient than one from a library, because it is given less attention.

    CornedString
    All these reason suggest that there ought to be a library of various string implementations, all with clearly defined storage semantics, for broad use.
    My design goals in this library were speed, interoperability and adherence to the requirements on std::string (so that, in theory, any one of these classes could be used as a compliant std::string implementation). Another design goal is that std::string can be treated like any other part of the library.

    Common Base?
    When you have multiple classes that all share the same interface, you might be tempted to offer an abstract base class for all of them, which offers a consistent interface. This would have the advantage that passing any of the derived classes to a generic function is very easy, just use a reference to the common base as parameter type:
    Code:
    void generic_function(const string_base &str);
    This has drawbacks, though. First, it requires a virtual destructor of the classes. Such a thing is not planned in the standard.
    Second, std::string does not derive from that base, so it could not be used the same way as any other of these classes.
    Third, it adds the overhead of a vptr to the class, which is not acceptable for the speed requirements.
    We need to find a different solution.

    External vPtr
    There are three ways of parameter passing that my alternative way must support: pass-by-value, pass-by-reference and pass-by-const-reference. Since the classes must not have a common base, I can't use traditional means for this.
    Instead, I created three different wrapper classes for these three ways. These are any_string (pass-by-value), any_string_ref (pass-by-ref) and any_string_const_ref (pass-by-const-ref).
    Any kind of object that is interface-compatible with std::string can be assigned to these three classes. The technique used for this is the same that Boost.Any uses. Effectively, I create a vTable for these objects and wrap an object that contains nothing but a vPtr around the actual string object.
    Uhuh. Let me show you some pseudo-code of how this is done:
    Code:
    class any_string
    {
      class holder_base
      {
      public:
        virtual do_something() = 0;
      };
      template <typename Str>
      class holder : public holder_base
      {
        Str held;
      public:
        holder(const Str &s) : held(s) {}
        virtual do_something() {
    	  held.do_something();
    	}
      };
    
      holder_base *held;
    public:
      void do_something() {
        held->do_something();
      }
    
      template <typename Str>
      any_string(const Str &s) {
        held = new holder<Str>(s);
      }
    };
    When any_string is assign an object, it allocates an appropriate subclass of holder_base and stores the value in there. Since holder only holds its vptr and an object of the stored type, I've effectively wrapped a vptr around the stored object. This is why I call the technique external vptr.


    My Problem
    So far, so good. However, this raises an interesting problem regarding the semantics of operator =.
    Here's the declaration of operator =:
    Code:
    class any_string { //...
      any_string &operator =(const any_string &rhs);
    };
    The implementation is what worries me.
    In a normal string, there is one level of indirection: the string objects redirects operations to its buffer, where the actual character data resides. It is thus quite obvious that an assignment to the string ought to change the character data in the buffer.
    any_string has two levels of indirection: the any_string object redirects operations to its held object, which in turn redirects them to its buffer.
    operator = may be a special case though. My question is, should I, as the any_string, redirect the assignment to the held object, so that the character buffer gets replaced? Or should I replace the held object itself?
    I am unsure which to do. Any and all input is highly appreciated.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  2. #2
    Fanatic Member
    Join Date
    Dec 2003
    Posts
    703
    I'm not sure I understand the use of holder/holder_base. Why doesn't any_string store the string directly?
    an ending

  3. #3

    Thread Starter
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    Because that would change the storage semantics.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  4. #4
    Fanatic Member
    Join Date
    Dec 2003
    Posts
    703
    I don't mean store the buffer directly, but store the string object of the templated type, rather than having the holder class..I don't get what the function of the holder class is.
    an ending

  5. #5

    Thread Starter
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    It keeps the template away from any_string. Suppose you have various string classes, say,
    strings::refcount_string: a refcounted copy-on-write string
    strings::buffered_string: a string with an internal buffer
    strings::hashed_string: a string that's guaranteed to have only one instance of a value across all instances of this class (Java-like)

    And you want a method to be able to take any of these by-value (by-ref or by-const-ref are handled differently, I have no problems there). Then you could write:
    Code:
    void generic_method(any_string str)
    {
      // Do something with str
    }
    
    refcount_string s1("foo");
    buffered_string s2("bar");
    hashed_string s3("atom");
    
    generic_method(s1);
    generic_method(s2);
    generic_method(s3);
    And be sure that generic_method keeps the same storage semantics as the outer value. At the same time, generic_method can manipulate that string without the outer values being affected, because it's a by-value passing and a copy is created.

    It is not possible to implement any_string as a non-template without the holder_base/holder trick while allowing this.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  6. #6
    Fanatic Member riis's Avatar
    Join Date
    Nov 2001
    Posts
    551
    In what circumstances do you intend to use any_string.
    Only as a function argument, or will it be possible to replace any_string throughout the entire program where it's used? (The latter will be possible, so I guess you mean this case.)

    I would implement both ways, depending on the situation (and if you're allowed to use RTTI).
    If the type of held is the same as rhs.held, then you only need to copy the buffer. If the type is different, then it is unavoidable to delete held (old type) and create held again (new type). The type of the held object cannot just change. (It would be wrong for a refcount_string to hold a buffered_string.)

  7. #7

    Thread Starter
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    The two options I have are to tell the held string to adopt the new character sequence (by copying it if necessary, and yes, I'm using RTTI, kind of), or to replace the held object. There wouldn't be any situations where incompatible storage semantics would clash, in that case the character data itself would be handled.

    So, the two implementations of operator= would be:
    Code:
    any_string &operator =(const any_string &rhs) {
      held->assign(rhs);
      return *this;
    }
    
    void holder<StrT>::assign(const any_string &rhs) {
      if(get_type() == rhs.held->get_type()) {
        held.assign(any_string_cast<StrT>(rhs));
      } else {
        held.assign(rhs.c_str());
      }
    }
    Code:
    any_string &operator =(const any_string &rhs) {
      delete held;
      held = rhs.held->clone();
    }
    
    holder_base *holder<StrT>::clone() {
      return new holder<StrT>(held);
    }
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  8. #8
    Fanatic Member riis's Avatar
    Join Date
    Nov 2001
    Posts
    551
    Originally posted by CornedBee
    Code:
    void holder<StrT>::assign(const any_string &rhs) {
      if(get_type() == rhs.held->get_type()) {
        held.assign(any_string_cast<StrT>(rhs));
      } else {
        held.assign(rhs.c_str());
      }
    }
    (From the first code block.)

    A few questions:
    * Does any_string have a c_str method? It's not mentioned in your first post.
    * What does any_string_cast do?
    * Why can't you also use the c_str method when both types are equal? It doesn't seem to matter. (See also next question.)
    * In this case the rhs and the result string won't be the same type. (You're just copying the buffer, not changing the type of the variable held.) Is this desired? This differs from the result in the second block.

    (BTW, it would be easier to understand if you don't have two variables with the same name: held )

    The second option seems much better to me, although it's done in two steps.

  9. #9

    Thread Starter
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    * Does any_string have a c_str method? It's not mentioned in your first post.
    Yes, it has. Like any other of my string classes, it's supposed to be interface-compatible with std::string. Again, it just passes the call on to the held object.
    * What does any_string_cast do?
    It's the equivalent of dynamic_cast for this special thing. It could be implemented like this:
    Code:
    // any_string's friend:
    template <typename Target>
    Target &any_string_cast(any_string &rhs) {
      holder<Target> *ptr = dynamic_cast<holder<Target> >(rhs.held);
      if(!ptr) {
        throw bad_any_string_cast();
      }
      return ptr->held;
    }
    * Why can't you also use the c_str method when both types are equal? It doesn't seem to matter. (See also next question.)
    Because it may be more efficient this way, for example, if the copied string employs refcounting. If the passed-in string is not a refcounted string too, the refcounting won't work.

    * In this case the rhs and the result string won't be the same type. (You're just copying the buffer, not changing the type of the variable held.) Is this desired? This differs from the result in the second block.
    That's my question. Which of the two behaviours makes more sense? Keep in mind that the assign member ought to do the same thing as operator=. Should assign change the held object? Or should it be passed on?
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  10. #10
    Frenzied Member Technocrat's Avatar
    Join Date
    Jan 2000
    Location
    I live in the 1s and 0s of everyones data streams
    Posts
    1,024
    Hmm I could feel my eyes glaze over as I read this. Damn your advanced stuff

    Seems to me that consistenancy in your program coding would decide how to solve this. If all your functions in that template deal with the held object, then you should follow the standards that you already set. If not then the reverse would seem true to me.

    I am not really sure if you gain or loose anything by doing it either way, since both go back to the buffer.

    Just my thoughts for what its worth.
    MSVS 6, .NET & .NET 2003 Pro
    I HATE MSDN with .NET & .NET 2003!!!

    Check out my sites:
    http://www.filthyhands.com
    http://www.techno-coding.com


  11. #11

    Thread Starter
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    Tech, that's fits with the only other real answer I got. Thanks, I think I'll do it that way.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  12. #12
    Frenzied Member Technocrat's Avatar
    Join Date
    Jan 2000
    Location
    I live in the 1s and 0s of everyones data streams
    Posts
    1,024
    Glad to help you for a change

    I have to ask........why are you the lion king now?
    MSVS 6, .NET & .NET 2003 Pro
    I HATE MSDN with .NET & .NET 2003!!!

    Check out my sites:
    http://www.filthyhands.com
    http://www.techno-coding.com


  13. #13

    Thread Starter
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594
    Because I saw the movie again (after a loooong time) and became completely addicted.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

  14. #14
    Frenzied Member Technocrat's Avatar
    Join Date
    Jan 2000
    Location
    I live in the 1s and 0s of everyones data streams
    Posts
    1,024
    *shrug* who am I to argue with you
    MSVS 6, .NET & .NET 2003 Pro
    I HATE MSDN with .NET & .NET 2003!!!

    Check out my sites:
    http://www.filthyhands.com
    http://www.techno-coding.com


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width