Changing std::sort at Google’s Scale and Beyond

2022-04-20T17:23:27+00:00

I don’t know of many technical articles I’ve read in my life this thorough, comprehensive, and well-explained.

Thanks for taking the time to write this!

LikeLike

Reply

2022-04-20T19:58:29+00:00

Typo: “… compare functions much comply …” should be “must comply”

LikeLike

Reply

2022-04-21T04:40:21+00:00

Beautifully written article. Thank you for that.

LikeLike

Reply

2022-05-11T03:41:35+00:00

Hi Danila Kutenin, great article. Moreover, great task to change std::sort.

I was working on sorting algorithms as a hobby since last year.
Would you have a look at my projects:

https://github.com/aldo-gutierrez/bitmasksorter
https://github.com/aldo-gutierrez/bitmasksortercpp

Maybe the only thing new that adds is the bitmask algorithm described there, to know how many bits are really been used(different) in the data to be sorted, in this way we can avoid some work to sort faster.

This bitmask takes less than 1% of the sorting time and can help in the choosing of the best algorithm. For example (not necessarily correct) if very few bits are used just use Count Sort, if many bits are used Radix Byte Sort otherwise RadixBitSort or QuickBitSort. I haven’t read all the articles on sorting, so maybe this has already been considered.

Thanks.

LikeLiked by 1 person

Reply

2022-11-20T15:04:42+00:00

w.r.t. sorting floats, I wonder when c++ will get a IEEE 754 totalOrder predicate for doing the job. If/when they put one in, sorting floats could “safely” be done without type punning.

LikeLike

Reply

2023-03-22T15:58:10+00:00

Hi, I enjoyed the article. Thanks!
> I guess this is the reinforcement learning authors talk about in the patch.
May I ask the title of the paper that uses reinforcement learning to write the patch?

LikeLike

Reply

	int quadratic(int size) {
	int num_solid = 0;
	// gas means "infinity"
	int gas = size + 1;
	int comparison_count = 0;
	std::vector<int> indices(size);
	std::vector<int> values(size);
	// indices are 0, 1, …, size
	// values are all infinite
	for (int i = 0; i < size; ++i) {
	indices[i] = i;
	values[i] = gas;
	}
	// Enforce uniform input distribution!
	std::random_device rd;
	std::mt19937 g(rd());
	std::shuffle(indices.begin(), indices.end(), g);
	sort(indices.begin(), indices.end(), [&](int x, int y) {
	comparison_count += 1;
	// If both infinite, set left.
	// Otherwise gas is always more
	if (values[x] == gas && values[y] == gas) {
	values[x] = num_solid++;
	return true;
	} else if (values[x] == gas) {
	return false;
	} else if (values[y] == gas) {
	return true;
	} else {
	return values[x] < values[y];
	}
	});
	return comparison_count;
	}

	int main(int argc, char** argv) {
	std::cout << "N: comparisons\n";
	for (int i = 100; i <= 6400; i *= 2) {
	std::cout << i << ": " << quadratic(i) << "\n";
	}
	return 0;
	}
	/*
	N: comparisons
	100: 2773
	200: 10623
	400: 41323
	800: 162723
	1600: 645523
	3200: 2571123
	6400: 10262323
	*/

	struct FieldNumberSorter {
	bool operator()(const FieldDescriptor* left,
	const FieldDescriptor* right) const {
	// Sorting by tag order.
	return left->number() < right->number();
	}
	};
	void Reflection::ListFieldsMayFailOnStripped(
	const Message& message, bool should_fail,
	std::vector<const FieldDescriptor> output) const {
	// Traverse all fields in their order of declaration.
	for (int i = 0; i <= last_non_weak_field_index; i++) {
	const FieldDescriptor* field = descriptor_->field(i);
	if (FieldSize(message, field) > 0) {
	output->push_back(field);
	}
	}
	// Sort by their tag number
	std::sort(output->begin(), output->end(), FieldNumberSorter());
	}

	message GoogleMessage2 {
	// type name tag
	// vvvvvv vvvvvv vvv
	optional string field1 = 1;
	optional int64 field3 = 3;
	optional int64 field4 = 4;
	optional int64 field30 = 30;
	optional bool field75 = 75 [default = false];
	// Just slightly out of order, 1, 3, 4, 30, 75, 6, 2
	optional string field6 = 6;
	optional bytes field2 = 2;
	// …
	}

	template <class _Compare, class _ForwardIterator>
	unsigned __sort4(_ForwardIterator __x1, _ForwardIterator __x2, _ForwardIterator __x3, _ForwardIterator __x4,
	_Compare __c) {
	unsigned __r = _VSTD::__sort3<_Compare>(__x1, __x2, __x3, __c);
	if (__c(__x4, __x3)) {
	swap(__x3, __x4);
	++__r;
	if (__c(__x3, __x2)) {
	swap(__x2, __x3);
	++__r;
	if (__c(__x2, __x1)) {
	swap(__x1, __x2);
	++__r;
	}
	}
	}
	return __r;
	}

	template <class _Compare, class _ForwardIterator>
	unsigned __sort5(_ForwardIterator __x1, _ForwardIterator __x2, _ForwardIterator __x3,
	_ForwardIterator __x4, _ForwardIterator __x5, _Compare __c) {
	unsigned __r = _VSTD::__sort4<_Compare>(__x1, __x2, __x3, __x4, __c);
	if (__c(__x5, __x4)) {
	swap(__x4, __x5);
	++__r;
	if (__c(__x4, __x3)) {
	swap(__x3, __x4);
	++__r;
	if (__c(__x3, __x2)) {
	swap(__x2, __x3);
	++__r;
	if (__c(__x2, __x1)) {
	swap(__x1, __x2);
	++__r;
	}
	}
	}
	}
	return __r;
	}

	const unsigned __limit = 8;
	unsigned __count = 0;
	for (_RandomAccessIterator __i = __j + difference_type(1); __i != __last; ++__i) {
	if (__comp(__i, __j)) {
	value_type __t(_VSTD::move(*__i));
	_RandomAccessIterator __k = __j;
	__j = __i;
	do {
	__j = _VSTD::move(__k);
	__j = __k;
	} while (__j != __first && __comp(__t, *–__k));
	*__j = _VSTD::move(__t);
	if (++__count == __limit)
	return ++__i == __last;
	}
	__j = __i;
	}

	void
	__introsort(_RandomAccessIterator __first, _RandomAccessIterator __last,
	difference_type __depth) {
	// …
	while (true) {
	if (__depth == 0) {
	// Fallback to heap sort as Introsort suggests.
	_VSTD::__partial_sort<_Compare>(__first, __last, __last);
	return;
	}
	–__depth;
	// …
	}

	template <typename _Number>
	inline _Number __log2i(_Number __n) {
	_Number __log2 = 0;
	while (__n > 1) {
	__log2++;
	__n >>= 1;
	}
	return __log2;
	}

	void __sort(_RandomAccessIterator __first, _RandomAccessIterator __last) {
	difference_type __depth_limit = 2 * __log2i(__last – __first);
	_VSTD::__introsort(__first, __last, __depth_limit);
	}

	std::vector<std::pair<int, int>> first_elements_equal{{1, 1}, {1, 2}};
	std::sort(first_elements_equal.begin(),
	first_elements_equal.end(),
	[](const auto& lhs, const auto& rhs) {
	// Compare only by a part of sorted data.
	return lhs.first < rhs.first;
	});
	// Serialize or make assumptions about all data.
	// Wrong, might be either 1 or 2.
	assert(first_elements_equal[0].second == 1);

	create table example (id_1 integer, id_2 integer);

	— Insert lots of equal id_1
	insert into example (id_1, id_2) values (1, 1);
	insert into example (id_1, id_2) values (0, 3);
	insert into example (id_1, id_2) values (0, 2);
	insert into example (id_1, id_2) values (0, 3);
	insert into example (id_1, id_2) values (0, 2);
	insert into example (id_1, id_2) values (0, 3);
	insert into example (id_1, id_2) values (1, 3);

	— Order only by the first element, second
	— is undefined for equal first elements.
	select * from example order by id_1;

	// _LIBCPP_DEBUG_RANDOMIZE_RANGE is std::shuffle

	template <class _RandomAccessIterator, class _Compare>
	void sort(_RandomAccessIterator __first, _RandomAccessIterator __last,
	_Compare __comp) {
	// Randomize range.
	_LIBCPP_DEBUG_RANDOMIZE_RANGE(__first, __last);
	typedef typename __comp_ref_type<_Compare>::type _Comp_ref;
	// Call internal sort.
	_VSTD::__sort<_Comp_ref>(_VSTD::__unwrap_iter(__first),
	_VSTD::__unwrap_iter(__last), _Comp_ref(__comp));
	}

	template <class _RandomAccessIterator, class _Compare>
	void nth_element(_RandomAccessIterator __first, _RandomAccessIterator __nth,
	_RandomAccessIterator __last, _Compare __comp) {
	// Randomize range.
	_LIBCPP_DEBUG_RANDOMIZE_RANGE(__first, __last);
	typedef typename __comp_ref_type<_Compare>::type _Comp_ref;
	// Call internal nth_element.
	_VSTD::__nth_element<_Comp_ref>(__first, __nth, __last, __comp);
	// Both sides of the partition do not have ordering requirements.
	_LIBCPP_DEBUG_RANDOMIZE_RANGE(__first, __nth);
	if (__nth != __last) {
	_LIBCPP_DEBUG_RANDOMIZE_RANGE(++__nth, __last);
	}
	}

	template <class _RandomAccessIterator, class _Compare>
	void partial_sort(_RandomAccessIterator __first, _RandomAccessIterator __middle,
	_RandomAccessIterator __last, _Compare __comp) {
	// Randomize range.
	_LIBCPP_DEBUG_RANDOMIZE_RANGE(__first, __last);
	typedef typename __comp_ref_type<_Compare>::type _Comp_ref;
	_VSTD::__partial_sort<_Comp_ref>(__first, __middle, __last, __comp);
	// Trailing part does not have any ordering requirement.
	_LIBCPP_DEBUG_RANDOMIZE_RANGE(__middle, __last);
	}

	static uint_fast64_t __seed() {
	// static variable address may be randomized if built with ASLR.
	static char __x;
	return reinterpret_cast<uintptr_t>(&__x);
	}

	std::vector<int> values = { /* more than 10 values */ };
	std::nth_element(values.begin(), values.begin() + 9, values.end());
	int tenth_element = values[9];

	std::vector<int> values = { /* more than 10 values */ };
	std::partial_sort(values.begin(), values.begin() + 10, values.end());
	int tenth_element = values[9];

	struct Data {
	bool has_property;
	// …
	};

	std::vector<Data> data(FillData());
	std::sort(data.begin(), data.end(), [](const Data& lhs, const Data& rhs) {
	return lhs.has_property < rhs.has_property;
	});

	// We only care about the n entities with the highest scores.
	std::sort(vector.begin(), vector.end(),
	HasHigherScore());
	vector.resize(n);

	// The search going up is known to be guarded but the search coming down isn't.
	// Prime the downward search with a guard.
	// __m still guards upward moving __i
	while (__comp(__i, __m))
	++__i;

	std::vector<double> vector_of_doubles(FillData());
	std::sort(vector_of_doubles.begin(), vector_of_doubles.end());

	// In <algorithm>
	// Declare an inline/weak variable.
	extern void (*ExtremelyHackyCallThatYouWillNotOverride)()
	__attribute__((weak)) = nullptr;

	// …

	// Before the sort call
	// …
	if (ExtremelyHackyCallThatYouWillNotOverride)
	ExtremelyHackyCallThatYouWillNotOverride();
	_VSTD::__sort<_Comp_ref>(_VSTD::__unwrap_iter(__first), _VSTD::__unwrap_iter(__last), _Comp_ref(__comp));
	// …

	// main.cpp

	#include <algorithm>

	// Prints the stacktrace.
	void backtrace_dumper();

	int main(int argc, char** argv) {
	ExtremelyHackyCallThatYouWillNotOverride = &backtrace_dumper;
	return InvokeRealMain(argc, argv);
	}

	// __m is median. partition [__first, __m) < __m and
	// *__m <= [__m, __last)
	//
	// Special handling for almost sorted targets
	while (true) {
	while (__comp(++__i, __m));
	while (!__comp(–__j, __m));
	if (__i > __j) break;
	swap(__i, __j);
	}
	swap(__i, __m);

	// We only care about the n entities with the highest scores.
	std::partial_sort(vector.begin(),
	vector.begin() + n,
	vector.end(),
	HasHigherScore());
	vector.resize(n);

	// Use Tuckey's ninther technique or median of 3 for pivot selection.
	// Get the median out of 3 medians of 9 elements
	// (first, first + half, last – 1)
	// (first + 1, first + half – 1, last – 2)
	// (first + 2, first + half + 1, last – 3)
	_VSTD::__sort3<_Compare>(__first,
	__first + __half_len,
	__last – difference_type(1),
	__comp);
	_VSTD::__sort3<_Compare>(__first + difference_type(1),
	__first + (__half_len – 1),
	__last – difference_type(2),
	__comp);
	_VSTD::__sort3<_Compare>(__first + difference_type(2),
	__first + (__half_len + 1),
	__last – difference_type(3),
	__comp);
	_VSTD::__sort3<_Compare>(__first + (__half_len – 1),
	__first + __half_len,
	__first + (__half_len + 1),
	__comp);
	_VSTD::iter_swap(__first, __first + __half_len);

	// Ensures that __x, __y and *__z are ordered according to the comparator __c,
	// under the assumption that __y and __z are already ordered.
	template <class _Compare, class _RandomAccessIterator>
	inline void __partially_sorted_swap(_RandomAccessIterator __x, _RandomAccessIterator __y,
	_RandomAccessIterator __z, _Compare __c) {
	using value_type = typename iterator_traits<_RandomAccessIterator>::value_type;
	bool __r = __c(__z, __x);
	value_type __tmp = __r ? __z : __x;
	__z = __r ? __x : *__z;
	__r = __c(__tmp, *__y);
	__x = __r ? __x : *__y;
	}

	template <class _Compare, class _RandomAccessIterator>
	inline void __sort3(_RandomAccessIterator __x1, _RandomAccessIterator __x2,
	_RandomAccessIterator __x3, _Compare __c) {
	_VSTD::__cond_swap<_Compare>(__x2, __x3, __c);
	_VSTD::__partially_sorted_swap<_Compare>(__x1, __x2, __x3, __c);
	}

Chapter 1. History

C++ history

How was really the first std::sort implemented?

A minor problem with quicksort

Moving on with quadratic behavior

Are modern C++ standard libraries actually compliant?

What is std::nth_element?

What happened to std::sort?

LLVM history

Theory: presortedness

LLVM history continues

Quadratic problem

How many real world cases got there into heap sort?

Chapter 2. Changing sorting is easy, isn’t it?

How to find all equal elements dependencies?

Seeding techniques

Partial vs nth danger

Which failures will you probably discover?

Goldens

Oh, crap, determinism

Side note: defaults in other languages are different and that’s probably good

Logical Bugs

Sorting of binary data

Sorting more than needed

C++ is hard

Not following strict weak ordering

Violation of irreflexivity and asymmetry

30 vs 31 elements. Happy execution vs SIGSEGV

Violation of transitivity of incomparability

Wait, but finding strict weak ordering violations takes cubic time

std::nth_element bug to randomization ratio is the highest. Here is why

How can you find bad sorting calls among hundreds of places in your codebase?

A very small danger note

Automating process by a small margin

Chapter 3. Which sorting are we replacing with?

A side note on distribution

Branch (mis)predictions for cheap comparisons

Heavy comparisons

Reinforcement learning for small sorts

Conclusion

How can you help?

Final thoughts

Acknowledgements

References

Share this:

6 thoughts on “Changing std::sort at Google’s Scale and Beyond”

Leave a comment Cancel reply

How was really the first `std::sort` implemented?

What is `std::nth_element`?

What happened to `std::sort`?