文章导读

离散化

RonF02 2026年2月2日 95

今天的最后一个章节，写完睡觉。

首先在正式开始之前，我先提一下我产生过的疑惑：

离散化能不能被map<int, vector<int>>代替？

答案肯定是否定的~~（话说你说话为什么这么绕啊 (￢_￢;) ）~~
至于为什么？根据朋友中的二位大佬的切身经验，map<int, vector<int>>会被卡常，而且写起来也没有那么方便。也就是说并非绝对不能用，而是要尽可能避免的东西。

好的，接下来开始这篇文章的正题部分

离散化

离散化在这里特指整数的离散化，算法较为复杂，这里将逐行讲解。虽然本人更喜欢oi wiki的版本，但由于又跟着y总学习，于是将两者结合了一下。

我们先来看一下离散化有什么作用

离散化是用于值域特别大，而数据量不大，同时要用值作为下标时，把值映射到从0（或1）开始的自然数时采用的操作
例如：
a[] = {232324, -1919810, 114514, 0, 1, 1};
离散化后就变成了
a[] = {4, 0, 3, 1, 2, 2}; (此处采用0 base的映射)

我们会遇到以下两个问题：

a[]中可能有重复元素：需要去重
如何计算a[i]离散化后的值

很幸运的是，二者都有解决的方法。对于1而言，去重有std::unique可使用，而2又可以使用二分查找或lower_bound解决

我们来看示例代码

C++

    vector<int> alls = {232324, -1919810, 114514, 0, 1, 1}; // 存储原数组
    vector<int> tmp(alls); // 拷贝原数组
    sort(tmp.begin(), tmp.end()); // 将所有值排序
    tmp.erase(unique(tmp.begin(), tmp.end()), tmp.end()); // 去掉重复元素
    // 第二第三行实现的是数组排序 + 去重
    // unique函数会把重复的元素扔到数组最后，返回值是不含重复元素的尾迭代器
    // 也就是说，第三行相当于把排序后的有重复的元素从alls中erase掉了

    for (int i = 0; i < alls.size(); i ++ )
        // 二分求出 x 对应的离散化的值 
        // 找到第一个大于等于 x 的位置， 返回下标（默认0base，+1则为1base）
        alls[i] = lower_bound(tmp.begin(),tmp.end(),alls[i]) - tmp.begin();
        // alls[i] = find(alls[i], tmp); 
    
    for (auto i : alls) cout << i << ' '; // 输出4 0 3 1 2 2

    vector<int> alls = {232324, -1919810, 114514, 0, 1, 1}; // 存储原数组
    vector<int> tmp(alls); // 拷贝原数组
    sort(tmp.begin(), tmp.end()); // 将所有值排序
    tmp.erase(unique(tmp.begin(), tmp.end()), tmp.end()); // 去掉重复元素
    // 第二第三行实现的是数组排序 + 去重
    // unique函数会把重复的元素扔到数组最后，返回值是不含重复元素的尾迭代器
    // 也就是说，第三行相当于把排序后的有重复的元素从alls中erase掉了

    for (int i = 0; i < alls.size(); i ++ )
        // 二分求出 x 对应的离散化的值 
        // 找到第一个大于等于 x 的位置， 返回下标（默认0base，+1则为1base）
        alls[i] = lower_bound(tmp.begin(),tmp.end(),alls[i]) - tmp.begin();
        // alls[i] = find(alls[i], tmp); 
    
    for (auto i : alls) cout << i << ' '; // 输出4 0 3 1 2 2

这一版是oi wiki版的离散化代码，用的是lower_bound。如果想要复习二分的话，y总的二分查找的模板如下：

C++

// 二分求出 x 对应的离散化的值
// 找到第一个大于等于 x 的位置， 返回下标（默认0base，r+1则为1base）
int find(int x, vector<int> &alls) 
{
    int l = 0, r = alls.size() - 1;
    while (l < r) 
    {
        int mid = l + r >> 1;
        if (alls[mid] >= x) r = mid;
        else l = mid + 1;
    }
    return r;
}

// 二分求出 x 对应的离散化的值
// 找到第一个大于等于 x 的位置， 返回下标（默认0base，r+1则为1base）
int find(int x, vector<int> &alls) 
{
    int l = 0, r = alls.size() - 1;
    while (l < r) 
    {
        int mid = l + r >> 1;
        if (alls[mid] >= x) r = mid;
        else l = mid + 1;
    }
    return r;
}

关于二分的详细内容可查看整数二分

区间和

题目出处区间和。可能有种似曾相识的感觉，因为我们在差分这个章节已经见到过了。但是我们这次不能单纯地去开数组，因为数据范围 太！大！了！1e9的范围根本开不了前缀和数组，内存会爆的。于是我们就用到了离散化。

在这道题目中，我们采用离线的操作进行离散化。因为我们要知道所有的关键节点（经过修改的点和查询中的l和r）后才能建立正确的映射。

先看我们的储存代码

C++

for (int i = 0; i < n; i ++ )
{
    int x, c;
    cin >> x >> c;
    add.push_back({x, c});

    alls.push_back(x);
}

for (int i = 0; i < m; i ++ )
{
    int l, r;
    cin >> l >> r;
    query.push_back({l, r});
    
    alls.push_back(l);
    alls.push_back(r);
}

for (int i = 0; i < n; i ++ )
{
    int x, c;
    cin >> x >> c;
    add.push_back({x, c});

    alls.push_back(x);
}

for (int i = 0; i < m; i ++ )
{
    int l, r;
    cin >> l >> r;
    query.push_back({l, r});
    
    alls.push_back(l);
    alls.push_back(r);
}

非常朴素的两个循环

然后来看离散化和前缀和预处理的部分，核心只有两行，并没有按照oiwiki上的开一个tmp数组复制，因为这里我们可以在前缀和预处理操作中进行了查询操作。同时我们已经有了vector<PII>存储了需要的数据，也就不存在备份数据防止污染的问题了。

C++

sort(alls.begin(), alls.end());
alls.erase(unique(alls.begin(), alls.end()), alls.end());

vector<int> a(alls.size() + 1), s(alls.size() + 1);

for (auto item : add)
{
    int x = lower_bound(alls.begin(), alls.end(), item.first) - alls.begin() + 1;
    a[x] += item.second;
}

for (int i = 1; i <= alls.size(); i ++ ) s[i] = s[i - 1] + a[i];

sort(alls.begin(), alls.end());
alls.erase(unique(alls.begin(), alls.end()), alls.end());

vector<int> a(alls.size() + 1), s(alls.size() + 1);

for (auto item : add)
{
    int x = lower_bound(alls.begin(), alls.end(), item.first) - alls.begin() + 1;
    a[x] += item.second;
}

for (int i = 1; i <= alls.size(); i ++ ) s[i] = s[i - 1] + a[i];

最后是查询操作，很简单好懂的前缀和操作，之前的文章也有提及

C++

for (auto item : query)
    {
        int l = lower_bound(alls.begin(), alls.end(), item.first) - alls.begin() + 1;
        int r = lower_bound(alls.begin(), alls.end(), item.second) - alls.begin() + 1;
        cout << s[r] - s[l - 1] << '\n';
    }

for (auto item : query)
    {
        int l = lower_bound(alls.begin(), alls.end(), item.first) - alls.begin() + 1;
        int r = lower_bound(alls.begin(), alls.end(), item.second) - alls.begin() + 1;
        cout << s[r] - s[l - 1] << '\n';
    }

终于写完这篇文章了，洗洗睡了。

2月3日补充

y总给了手写unique函数的方法

对于数组 a[] = {1, 1, 2, 3, 5, 6, 6};
不难发现需要获取的数要满足以下两个特征中的一个

是数组中的第一个数
a[i] != a[i – 1]

于是unique函数用双指针算法实现。其中第一个指针遍历所有的数，第二个指针保存当前存到了第几个数

C++

vector<int>::iterator unique(vector<int> &a)
{
    int j = 0;
    for (int i = 0; i < a.size(); i ++ )
        if (!i || a[i] != a[i - 1])
            a[j ++ ] = a[i];
    // a[0] ~ a[j - 1] 所有a中不重复的数
    return a.begin() + j;
}

vector<int>::iterator unique(vector<int> &a)
{
    int j = 0;
    for (int i = 0; i < a.size(); i ++ )
        if (!i || a[i] != a[i - 1])
            a[j ++ ] = a[i];
    // a[0] ~ a[j - 1] 所有a中不重复的数
    return a.begin() + j;
}

代码如上

查看评论列表

暂无评论

发表评论取消回复

访客未填写邮箱

[填写昵称邮箱后可以评论]

是先学算法，先刷题，还是先写日记呢？

加载中

离散化

离散化

区间和

2月3日补充

评论（0）

发表评论取消回复

插入代码

RonF02的博客

搜索

归档

个人信息

是先学算法，先刷题，还是先写日记呢？

加载中

离散化

离散化

区间和

2月3日补充

评论（0）

发表评论 取消回复

插入代码

RonF02的博客

搜索

归档

个人信息

发表评论取消回复