查找算法

基本概念

查找表(数据结构)，分为静态查找表和动态查找表
关键字：唯一标示某个记录的一个key，如数组的下标
ASL ： $ASL=\sum_{i=1}^{n}P_iC_i$ ,其中Pi是概率，Ci是找到第i个数据元素需要比较的次数

顺序查找

查找成功: $ASL=\frac{n+1}{2}$

查找失败: $ASL=n+1$

一般线性表的顺序查找

设置哨兵：也就是在存储的时候将array[0]留空，然后就不需要在for中进行多余的判断

struct SSTable
{
	ElementType *elem;
    int tableLen;
};

int searchSeq(SSTable table, ElementType key)
{
	table.elem[0] = key;
    for (int i = table.tableLen; table.elem[i] != key; i--);
    return i;
}

有序表的顺序查找

由于线性表有序，因此，查找失败不需要比较到另一端，只需要比较到table[n] > key / table[n] < key即可，也就是假设比较等概率失败，那么概率为$\frac{1}{n+1}$

查找成功：$ASL=\frac{n+1}2$

查找失败：$ASL=\frac{n}2 + \frac{n}{n+1}$

折半查找

要求有序

查找成功：$ASL=log_2(n+1)-1$

分块查找

综合了顺序查找和折半查找的优点

分为索引表，以及查找表

索引表中的每个索引对应一个查找表，索引表有序，查找表可以有序，也可以无序

因此，ASL=索引表的ASL + 查找表的ASL

索引表可以使用二分查找，$ASL=log_2(b+1) + \frac{s+1}2$

B/B+树

B树又称为多路平衡查找树，要掌握！重点掌握

B树的特性如下

树中的每个节点至多有m棵子树
若根节点不是终端节点，则至少有两棵子树
除根节点以外的所有非叶节点至少有m/2棵子树
所有非叶节点的结构满足
- n为节点中的关键字数量，用来判断该节点所能容纳的关键字是否已满
- K为节点的关键字，满足 $K_n>K_{n-1}>...>K_1$
- P为指向子树的指针，且子树 $P_{i-1}$ 的所有节点小于 $K_i$
- 所有的叶节点都出现在同一层次上，并且不带信息

查找

B树的查找有如下步骤

在B树中找节点
在节点内找关键字，采用二分查找
如果查找失败，根据二分查找的失败条件，找到对应的子树，继续查找

插入

只能插入到叶节点中，如果为非叶节点，从根开始查找，直到查找到叶子节点，然后在叶子节点中插入

删除

$[n,P_0,K_1,P_1,...,K_n,P_n],n为节点的数量，事先声明 t,n_{max} = t * 2 - 1$

删除规则如下：

如果关键字k在节点x中
- 如果x是叶节点
  - 若x中的关键字个数>t-1,则删除k
  - 若关键字个数=t-1
    - 左右兄弟树的关键字个数>=t,则向相邻的节点树借一个节点，然后调整父母与子树的关系
    - 若左右兄弟不够借，则合并左右兄弟和父母节点，成为新的子树
- 如果x不是叶节点，则做以下操作
  - a：如果x节点中，前于 $K_n$ 的子树节点 $P_{n-1}$ 的关键字个数>t-1，则找出子树中的最大值，然后递归删除(使用删除规则)
  - b: 如上，如果后于 $K_n$ 的子树节点 $P_{n}$ 的关键字个数>t-1，则找出子树的最小值，然后递归删除
  - c：如果不满足a和b，且 $P_{n-1},P_{n}$ 的关键字个数=t-1，则合并两棵子树

// C++ program for B-Tree insertion
#include<iostream>
using namespace std;

// A BTree node
class BTreeNode
{
    int t;      // Minimum degree (defines the range for number of keys)
    int *keys;  // An array of keys, the size is 2 * t - 1
    BTreeNode **C; // An array of child pointers, the size is 2 * t
    int n;     // Current number of keys
    bool leaf; // Is true when node is leaf. Otherwise false
public:
    BTreeNode(int _t, bool _leaf);   // Constructor

    // A utility function to insert a new key in the subtree rooted with
    // this node. The assumption is, the node must be non-full when this
    // function is called
    void insertNonFull(int k);

    // A utility function to split the child y of this node. i is index of y in
    // child array C[].  The Child y must be full when this function is called
    void splitChild(int i, BTreeNode *y);

    // A function to traverse all nodes in a subtree rooted with this node
    void traverse();

    // A function to search a key in subtree rooted with this node.
    BTreeNode *search(int k);   // returns NULL if k is not present.

// Make BTree friend of this so that we can access private members of this
// class in BTree functions
    friend class BTree;
};

// A BTree
class BTree
{
    BTreeNode *root; // Pointer to root node
    int t;  // Minimum degree
public:
    // Constructor (Initializes tree as empty)
    BTree(int _t)
    {  root = NULL;  t = _t; }

    // function to traverse the tree
    void traverse()
    {  if (root != NULL) root->traverse(); }

    // function to search a key in this tree
    BTreeNode* search(int k)
    {  return (root == NULL)? NULL : root->search(k); }

    // The main function that inserts a new key in this B-Tree
    void insert(int k);
};

// Constructor for BTreeNode class
BTreeNode::BTreeNode(int t1, bool leaf1)
{
    // Copy the given minimum degree and leaf property
    t = t1;
    leaf = leaf1;

    // Allocate memory for maximum number of possible keys
    // and child pointers
    keys = new int[2*t-1];
    C = new BTreeNode *[2*t];

    // Initialize the number of keys as 0
    n = 0;
}

// Function to traverse all nodes in a subtree rooted with this node
void BTreeNode::traverse()
{
    // There are n keys and n+1 children, travers through n keys
    // and first n children
    int i;
    for (i = 0; i < n; i++)
    {
        // If this is not leaf, then before printing key[i],
        // traverse the subtree rooted with child C[i].
        if (leaf == false)
            C[i]->traverse();
        cout << " " << keys[i];
    }

    // Print the subtree rooted with last child
    if (leaf == false)
        C[i]->traverse();
}

// Function to search key k in subtree rooted with this node
BTreeNode *BTreeNode::search(int k)
{
    // Find the first key greater than or equal to k
    int i = 0;
    while (i < n && k > keys[i])
        i++;

    // If the found key is equal to k, return this node
    if (keys[i] == k)
        return this;

    // If key is not found here and this is a leaf node
    if (leaf == true)
        return NULL;

    // Go to the appropriate child
    return C[i]->search(k);
}

// The main function that inserts a new key in this B-Tree
void BTree::insert(int k)
{
    // If tree is empty
    if (root == NULL)
    {
        // Allocate memory for root
        root = new BTreeNode(t, true);
        root->keys[0] = k;  // Insert key
        root->n = 1;  // Update number of keys in root
    }
    else // If tree is not empty
    {
        // If root is full, then tree grows in height
        if (root->n == 2*t-1)
        {
            // Allocate memory for new root
            BTreeNode *s = new BTreeNode(t, false);

            // Make old root as child of new root
            s->C[0] = root;

            // Split the old root and move 1 key to the new root
            s->splitChild(0, root);

            // New root has two children now.  Decide which of the
            // two children is going to have new key
            // if the key less than root.key[0], insert the key to left child
            // if the key gather than root.key[0], insert the key to right child
            int i = 0;
            if (s->keys[0] < k)
                i++;
            s->C[i]->insertNonFull(k);

            // Change root
            root = s;
        }
        else  // If root is not full, call insertNonFull for root
            root->insertNonFull(k);
    }
}

// A utility function to insert a new key in this node
// The assumption is, the node must be non-full when this
// function is called
void BTreeNode::insertNonFull(int k)
{
    // Initialize index as index of rightmost element
    int i = n-1;

    // If this is a leaf node
    if (leaf == true)
    {
        // The following loop does two things
        // a) Finds the location of new key to be inserted
        // b) Moves all greater keys to one place ahead
        while (i >= 0 && keys[i] > k)
        {
            keys[i+1] = keys[i];
            i--;
        }

        // Insert the new key at found location
        keys[i+1] = k;
        n = n+1;
    }
    else // If this node is not leaf
    {
        // Find the child which is going to have the new key
        while (i >= 0 && keys[i] > k)
            i--;

        // See if the found child is full
        if (C[i+1]->n == 2*t-1)
        {
            // If the child is full, then split it
            splitChild(i+1, C[i+1]);

            // After split, the middle key of C[i] goes up and
            // C[i] is splitted into two.  See which of the two
            // is going to have the new key
            if (keys[i+1] < k)
                i++;
        }
        C[i+1]->insertNonFull(k);
    }
}

// A utility function to split the child y of this node
// Note that y must be full when this function is called
void BTreeNode::splitChild(int i, BTreeNode *y)
{
    // Create a new node which is going to store (t-1) keys
    // of y
    BTreeNode *z = new BTreeNode(y->t, y->leaf);
    z->n = t - 1;

    // Copy the last (t-1) keys of y to z
    for (int j = 0; j < t-1; j++)
        z->keys[j] = y->keys[j+t];

    // Copy the last t children of y to z
    if (y->leaf == false)
    {
        for (int j = 0; j < t; j++)
            z->C[j] = y->C[j+t];
    }

    // Reduce the number of keys in y
    y->n = t - 1;

    // Since this node is going to have a new child,
    // create space of new child
    for (int j = n; j >= i+1; j--)
        C[j+1] = C[j];

    // Link the new child to this node
    C[i+1] = z;

    // A key of y will move to this node. Find location of
    // new key and move all greater keys one space ahead
    for (int j = n-1; j >= i; j--)
        keys[j+1] = keys[j];

    // Copy the middle key of y to this node
    keys[i] = y->keys[t-1];

    // Increment count of keys in this node
    n = n + 1;
}

// Driver program to test above functions
int main()
{
    BTree t(3); // A B-Tree with minium degree 3
    t.insert(10);
    t.insert(20);
    t.insert(5);
    t.insert(6);
    t.insert(12);
    t.insert(30);
    t.insert(7);
    t.insert(17);

    cout << "Traversal of the constucted tree is ";
    t.traverse();

    int k = 6;
    (t.search(k) != NULL)? cout << "\nPresent" : cout << "\nNot Present";

    k = 15;
    (t.search(k) != NULL)? cout << "\nPresent" : cout << "\nNot Present";

    return 0;
}

散列表

散列函数的构造方法：

除留余数法：假设hash map长度为m，取一个质数p<=m,index=key % p
数字分析法：
直接定址法：取线性函数，index=a*key + b
平方取中法：
折叠法

处理冲突的方法

开放定址法

公式为 $H_i=(H(key)+d_i)\%m$

线性探测法：$d_i=1,2,3,….m-1$,当冲突发生，寻找临近的单元，直到遍历整个表，这样会造成大量元素堆积
平方探测法：取$d_i=1^2,2^2,3^2,….（m/2)^2$,m必须是一个能表示成4k*3的质数
再散列法：发生冲突，在对其进行哈希
伪随机序列法：使用伪随机数
- 拉链法：index对应链表，把所有index相同的值用链表存储起来，索引的时候找到链表然后再进行查找

字符串匹配

1.模式匹配

找到开头的字符串，然后再用一个指针继续匹配，匹配失败则返回开头，继续尝试寻找下一个开头

2.kmp

通过目标字符串构建状态数组

排序算法

算法的稳定性：如果两个相同的元素R1，R2，当排序之后不改变位置，那么则称排序稳定，否则称其为不稳定的

冒泡排序

冒泡算法让大的值下沉，小的值上升/小的值下沉，大的值上升


#include <iostream>
void bubble_sort(int a[], int length)
{
    if (length <= 0) return;
    for (int i = 0; i < length; i++)
    {
        for (int j = 0; j < length - 1; j++)
        {
            if (a[j] > a[j+1])
            {
                int temp = a[j];
                a[j] = a[j+1];
                a[j+1] = temp;
            }
        }
    }
}

int main()
{
    int a[5] = {1,8,0,4,5};
    bubble_sort(a, 5);
    for (int i : a)
    {
        std::cout << i << std::endl;
    }
    return 0;
}

插入排序

插入排序将当前元素向左移动到合适的位置，此算法有两个循环

循环1:规定范围

循环2:移动数据

直接插入排序：边比较边移动

折半插入排序：先找到合适的位置，然后再移动

希尔排序

分治法，把数组分为多个，部分排序，然后整体排序

选择排序

选择最小的，移动到前面

堆排序

在构建堆时候，不断下沉

在插入堆的时候，上浮

快速排序

快速排序也属于分治的排序算法

先切分，切分的数据以切分的元素为界，左小右大/左大右小，然后对左右的切分数组再切分

#include <iostream>

int array[5] = {5,2,7,4,1};

void exch(int i, int h)
{
    int temp = array[i];
    array[i] = array[h];
    array[h] = temp;
}

//base on array[lo], split the array, [ array[n] < array[lo] ---- array[lo] ----- array[n] > array[lo]
int split(int lo, int hi)
{
    int i = lo;
    int j = hi + 1;

    while(true)
    {
        //search from left, get the index of element which greater than the array[lo]
        while (array[++i] < array[lo])
            if (i == hi) break;

        //search from right, get the index of element which less than the array[lo]
        while (array[--j] > array[lo])
            if (j == lo) break;

        if (i >= j) break;

        //exchange
        exch(i,j);
    }
    exch(lo, j);
    return j;
}

//split
void quick_sort(int lo, int hi)
{
    if (hi <= lo) return;
    int j = split(lo, hi);

    //split left sub array
    quick_sort(lo, j-1);

    //split right sub array
    quick_sort(j+1, hi);
}

int main()
{
    quick_sort(0, 4);
    for (int i: array)
        std::cout << "i:" << i << std::endl;
    return 0;
}