On the other hand using templates in C++ involves an additional abstraction layer with virtual method calls, while in C that's just a simple function call like any other
No. Templates are compile-time stuff, that have nothing to do with RTTI and OOP.
The most common problem with C++ that it is OOP, meaning it is pretty heavy on memory allocation, calling new and delete all the time, repeatedly.
This is the mark of bad C++ programming. Average skill C++ programmers don't realloc everytime they add elements to a vector... thankfully! (basically, instead of stupidly doing push_back in a for loop, just call reserve before that, and you'll eliminate a lot of realloc calls with all it's cost).
while debugging C++ code is a major PITA, and not just because of the name mangling
On this one, I agree, but by experience using LLVM's libc++ instead of the horrible GNU libstdc++ helps A LOT on the issue, they don't do one line functions that just calls other one-line functions, from files that mix tabs and space for indentation!
For sources, I've read more than one accross the internet, but you know how benchmarks are (very situational, does not shows much if anything of real life, etc, etc)...
Here is one, though.
The remark about benchmarks obviously applies to the rest of this message.
Also, your example is *not* measuring only the difference between sort and qsort, as it includes I/O, notably from the shitty streambuf lib. You're also not using the same sort code in both cases, in one of them you rely on std::less, in the other one you do "a - b". One of the important differences here is this phrase, extracted from man-page:
If two members compare as equal, their order in the sorted array is undefined.
So your C code is basically going to trigger undefined behaviors (each time the values are equal, depending on the compiler, you will have different results, this is likely to explain the behavior difference you're talking about).
In this trivial case, those UB are only going to impact performance, but as long as you use qsort for more complex expressions it might have other implications.
The fact templates takes more time to compile is not a news, neither, but when I compare performances, I usually don't talk about how much time to compile, this is pretty irrelevant except for debugging purposes. This is usually only a problem when one uses boost, which is both widely loved and widely despised (I'm in the 2nd category. This bunch of libs is just over-engineering that will make easy to debug problems serious pains in your ass, not to mention infamous compile time... it does bring some interesting features, but I prefer to avoid it still, as many people).
Here is what I have with codes that really compares the langs and their features, instead of using the very ugly stream shit that nobody serious would use when in need of performances (despite this benchmark being pretty poor).
Data is generated on a linux box with:
- {l Code}: {l Select All Code}
for i in $(seq 1 10000); do od -A n -t d -N 4 /dev/urandom; done > /tmp/data
And injected & edited (to add ',' at end of lines) in source code (in my case, with vim, but any editor can do).
C code:
- {l Code}: {l Select All Code}
#include <stdio.h>
#include <stdlib.h>
#define LEN(a) (sizeof(a)/sizeof(a[0]))
int cmp(const void *a, const void *b) { return *((int*)a) - *((int*)b); }
int main()
{
int s[] = { /* data generated by urandom and inserted in source code */ };
qsort(s, LEN(s), sizeof(int), cmp);
for(int i = 0; i < LEN(s); i++)
printf("%d ", i);
printf("\n");
}
- {l Code}: {l Select All Code}
% clang test.c -o cc && time (for i in $( seq 1 10000 ); do ./cc > /dev/null; done)
( for i in $( seq 1 10000 ); do; ./cc > /dev/null; done; ) 17,82s user 2,74s system 102% cpu 20,075 total
% clang -Os test.c -o cc && time (for i in $( seq 1 10000 ); do ./cc > /dev/null; done)
( for i in $( seq 1 10000 ); do; ./cc > /dev/null; done; ) 15,47s user 2,88s system 102% cpu 17,874 total
- {l Code}: {l Select All Code}
#include <algorithm>
#include <iterator>
#include <stdio.h>
int main()
{
int s[] = { /* data generated by urandom and inserted in source code */ };
std::sort( std::begin( s ), std::end( s ), []( int a, int b ){ return a < b; } );
for( int i : s )
printf("%d ", i);
printf("\n");
}
- {l Code}: {l Select All Code}
% clang++ test.cpp -o cpp && time (for i in $( seq 1 10000 ); do ./cpp > /dev/null; done)
( for i in $( seq 1 10000 ); do; ./cpp > /dev/null; done; ) 25,15s user 4,61s system 101% cpu 29,278 total
% clang++ -Os test.cpp -o cpp && time (for i in $( seq 1 10000 ); do ./cpp > /dev/null; done)
( for i in $( seq 1 10000 ); do; ./cpp > /dev/null; done; ) 15,13s user 3,50s system 102% cpu 18,143 total
So, you're right, it seems a bit slower. Not twice, though. On my system, there's only 0.269s difference, which means the C++ version, which does not have UB (unlike C one) runs 1.5% slower (in -Os build). Fixing the UB in C code would slow it down a lot, as this involves a condition. Let's give it a try:
- {l Code}: {l Select All Code}
#include <stdio.h>
#include <stdlib.h>
#define LEN(a) (sizeof(a)/sizeof(a[0]))
int cmp(const void *a, const void *b) { int ret = *((int*)a) - *((int*)b); return ret == 0 ? -1 : ret; }
int main()
{
int s[] = { /* data generated by urandom and inserted in source code */ };
qsort(s, LEN(s), sizeof(int), cmp);
for(int i = 0; i < LEN(s); i++)
printf("%d ", i);
printf("\n");
}
- {l Code}: {l Select All Code}
% clang test.c -o cc && time (for i in $( seq 1 10000 ); do ./cc > /dev/null; done)
( for i in $( seq 1 10000 ); do; ./cc > /dev/null; done; ) 17,17s user 3,00s system 102% cpu 19,691 total
% clang -Os test.c -o cc && time (for i in $( seq 1 10000 ); do ./cc > /dev/null; done)
( for i in $( seq 1 10000 ); do; ./cc > /dev/null; done; ) 15,87s user 2,88s system 102% cpu 18,271 total
Now, we do have an inverted difference: C++ std::sort is faster (in -Os build) by 0.128s, which is 0.7%.
Those differences in time *are* negligible, but they show that, with a well defined behavior, C++ is indeed faster. It's also easier to read imo.
As for build times:
- {l Code}: {l Select All Code}
% time clang -Os test.c
clang -Os test.c 0,06s user 0,00s system 99% cpu 0,068 total
% time clang++ -Os test.cpp
clang++ -Os test.cpp 0,31s user 0,02s system 99% cpu 0,335 total
As expected, C++ takes longer to compile.