Does anyone have code that is causing a bottleneck now?
No. But a couple of examples from c++ STL.

Needed a list to be sorted after insertions made (tens of thousands). Original had list kept in sorted order via inserting in the required place by first scanning list. Optimised version had all inserts done at the tail of the list which was then sorted once. Many minutes went down to just a couple of seconds.

Needed a vector as random access required. Again tens of thousands of insertions done (number not known in advance). Original had new elements added to end of vector which of course caused many, many memory reallocations. Optimised version used a list for the insertion at the tail which was then used to populate a vector once (size now known). Again minutes went down to a few seconds.