C++ set容器特性自动排序与去重机制-小浪学习网

<blockquote>C++ set容器基于红黑树实现，具备自动排序与去重特性，插入、删除、查找时间复杂度为O(log n)；可通过自定义比较函数对象或函数指针实现排序规则；与unordered_set相比，后者基于哈希表，平均操作时间复杂度O(1)，但无序且最坏情况性能下降；需有序或稳定性能时选set，仅需唯一性且追求平均速度时选unordered_set；批量插入时推荐使用范围insert减少红黑树调整，提升效率。</blockquote> <p><img src=”https://img.php.cn/upload/article/000/969/633/175564933129903.png” alt=”c++ set容器特性自动排序与去重机制”></p> <p>C++ set容器的核心特性在于其自动排序和去重机制。这意味着当你向set中插入元素时，它们会自动按照一定的规则（默认是升序）进行排序，并且任何重复的元素只会被保留一份。这使得set非常适合需要存储唯一且有序元素的场景。</p> <p><strong>自动排序与去重机制</strong></p> <p>set底层通常使用红黑树实现，这保证了插入、删除和查找操作的对数时间复杂度。当我们插入一个元素到set中时，set会首先检查该元素是否已经存在。如果存在，则插入操作会被忽略（去重）；如果不存在，则会将元素插入到红黑树的合适位置，以维持排序。</p> <p>这种自动排序和去重的特性，让set在某些场景下拥有独特的优势。例如，统计一段文本中不同单词的个数，或者对一组数据进行排序并去重等。</p> <p><span>立即学习</span>“<a href=”https://pan.quark.cn/s/6e7abc4abb9f” style=”text-decoration: underline !important; color: blue; font-weight: bolder;” rel=”nofollow” target=”_blank”>C++免费学习笔记（深入）</a>”；</p> <p><strong>副标题1：set的排序规则如何自定义？</strong></p> <p>默认情况下，set使用 <div class=”code” style=”position:relative; padding:0px; margin:0px;”><pre class=”brush:php;toolbar:false”><</pre><div class=”contentsignin”></div></div> 运算符进行排序。但有时我们需要自定义排序规则，比如按照元素的绝对值大小排序，或者按照字符串的长度排序。这时，我们可以通过以下两种方式来实现：</p> <ol><li> <strong>提供比较函数对象（functor）：</strong> 创建一个类，重载 <div class=”code” style=”position:relative; padding:0px; margin:0px;”><pre class=”brush:php;toolbar:false”>()</pre><div class=”contentsignin”></div></div> 运算符，定义自己的比较逻辑。然后，在创建set对象时，将该类的实例作为模板参数传递给set。</li></ol><div class=”code” style=”position:relative; padding:0px; margin:0px;”><pre class=’brush:cpp;toolbar:false;’>#include <iostream> #include <set> #include <cmath> struct AbsCompare { bool operator()(int a, int b) const { return std::abs(a) < std::abs(b); } }; int main() { std::set<int, AbsCompare> mySet = {-3, 1, -4, 2, -1}; for (int x : mySet) { std::cout << x << ” “; // Output: -1 1 2 -3 -4 } std::cout << std::endl; return 0; }</pre><div class=”contentsignin”></div></div><ol start=”2″><li> <strong>提供比较函数指针：</strong> 定义一个函数，接受两个参数，返回一个bool值，表示它们的排序关系。然后，在创建set对象时，将该函数的指针作为模板参数传递给set。</li></ol><div class=”code” style=”position:relative; padding:0px; margin:0px;”><pre class=’brush:cpp;toolbar:false;’>#include <iostream> #include <set> #include <string> bool compareStringLength(const std::string& a, const std::string& b) { return a.length() < b.length(); } int main() { std::set<std::string, bool(*)(const std::string&, const std::string&)> mySet(compareStringLength); mySet.insert(“apple”); mySet.insert(“banana”); mySet.insert(“kiwi”); for (const std::string& s : mySet) { std::cout << s << ” “; // Output: kiwi apple banana } std::cout << std::endl; return 0; }</pre><div class=”contentsignin”></div></div><p><strong>副标题2：set与unorde<a >red</a>_set的<a >区别</a>是什么？何时使用set，何时使用unordered_set？</strong></p> <p>set和unordered_set都是C++ STL中的关联容器，用于存储唯一元素。它们的主要区别在于底层实现和性能特点：</p> <ul> <li> <strong>set:</strong> 基于红黑树实现，元素有序存储，插入、删除和查找操作的时间复杂度为O(log n)。</li> <li> <strong>unordered_set:</strong> 基于哈希表实现，元素无序存储，插入、删除和查找操作的平均时间复杂度为O(1)，最坏情况下为O(n)。</li> </ul> <p><strong>何时使用set:</strong></p> <ul> <li>需要存储有序元素。</li> <li>对元素的顺序有要求。</li> <li>插入、删除和查找操作的性能要求稳定。</li> </ul> <p><strong>何时使用unordered_set:</strong></p> <ul> <li>不需要存储有序元素。</li> <li>对元素的顺序没有要求。</li> <li>追求更高的平均查找速度。</li> </ul> <p>选择哪个容器取决于具体的应用场景和性能需求。如果对元素的顺序有要求，或者需要稳定的性能，那么set是更好的选择。如果对元素的顺序没有要求，并且追求更高的平均查找速度，那么unordered_set是更好的选择。但需要注意，unordered_set 在处理哈希冲突时，性能可能会下降。</p> <p><strong>副标题3：如何高效地向set中插入大量数据？</strong></p> <p>向set中插入大量数据时，直接循环插入可能会导致性能瓶颈，因为每次插入都需要重新平衡红黑树。以下是一些提高插入效率的方法：</p> <ol><li> <strong>使用insert的范围版本：</strong> 如果数据已经存储在另一个容器（例如vector）中，可以使用insert的范围版本，一次性将所有元素插入到set中。这可以减少红黑树的调整次数。</li></ol><div class=”code” style=”position:relative; padding:0px; margin:0px;”><pre class=’brush:cpp;toolbar:false;’>#include <iostream> #include <set> #include <vector> int main() { std::vector<int> data = {5, 2, 8, 1, 9, 3, 7, 4, 6, 0}; std::set<int> mySet; mySet.insert(data.begin(), data.end()); for (int x : mySet) { std::cout << x << ” “; // Output: 0 1 2 3 4 5 6 7 8 9 } std::cout << std::endl; return 0; }</pre><div class=”contentsignin”></div></div><ol start=”2″> <li><p><strong>避免重复插入：</strong> 在插入之前，先检查元素是否已经存在于set中。如果存在，则避免重复插入。虽然set本身会去重，但提前检查可以避免不必要的红黑树调整。可以使用<div class=”code” style=”position:relative; padding:0px; margin:0px;”><pre class=”brush:php;toolbar:false”>set::count()</pre><div class=”contentsignin”></div></div> 方法判断元素是否存在。</p></li> <li><p><strong>使用移动语义：</strong> 如果元素是自定义类型，并且拷贝代价较高，可以使用移动语义来避免不必要的拷贝。这需要确保你的自定义类型支持移动构造函数和移动赋值运算符。</p></li> <li><p><strong>预分配内存（仅适用于某些底层实现）：</strong> 虽然set本身没有提供预分配内存的接口，但在某些底层实现中，预先分配足够的内存可以减少内存分配和释放的次数，从而提高性能。但这通常需要了解set的具体实现细节。</p></li> </ol> <p>选择哪种方法取决于数据的特点和具体的应用场景。通常来说，使用insert的范围版本是最简单且有效的方法。</p>

文章版权归作者所有，未经允许请勿转载。

THE END