Search & Copy Algorithms

查找算法

基本概念

  • 查找表(数据结构),分为静态查找表和动态查找表

  • 关键字:唯一标示某个记录的一个key,如数组的下标

  • ASL : $ASL=\sum_{i=1}^{n}P_iC_i$ ,其中Pi是概率,Ci是找到第i个数据元素需要比较的次数

顺序查找

查找成功:

查找失败:

一般线性表的顺序查找

设置哨兵:也就是在存储的时候将array[0]留空,然后就不需要在for中进行多余的判断

1
2
3
4
5
6
7
8
9
10
11
12
struct SSTable
{
ElementType *elem;
int tableLen;
};

int searchSeq(SSTable table, ElementType key)
{
table.elem[0] = key;
for (int i = table.tableLen; table.elem[i] != key; i--);
return i;
}

有序表的顺序查找

由于线性表有序,因此,查找失败不需要比较到另一端,只需要比较到table[n] > key / table[n] < key即可,也就是假设比较等概率失败,那么概率为$\frac{1}{n+1}$

查找成功:$ASL=\frac{n+1}2$

查找失败:$ASL=\frac{n}2 + \frac{n}{n+1}$

折半查找

要求有序

查找成功:$ASL=log_2(n+1)-1$

分块查找

综合了顺序查找和折半查找的优点

分为索引表,以及查找表

索引表中的每个索引对应一个查找表,索引表有序,查找表可以有序,也可以无序

因此,ASL=索引表的ASL + 查找表的ASL

索引表可以使用二分查找,$ASL=log_2(b+1) + \frac{s+1}2$

B/B+树

B树又称为多路平衡查找树,要掌握!重点掌握

B树的特性如下

  • 树中的每个节点至多有m棵子树
  • 若根节点不是终端节点,则至少有两棵子树
  • 除根节点以外的所有非叶节点至少有m/2棵子树
  • 所有非叶节点的结构满足
    • n为节点中的关键字数量,用来判断该节点所能容纳的关键字是否已满
    • K为节点的关键字,满足
    • P为指向子树的指针,且子树的所有节点小于
    • 所有的叶节点都出现在同一层次上,并且不带信息

查找

B树的查找有如下步骤

  • 在B树中找节点
  • 在节点内找关键字,采用二分查找
  • 如果查找失败,根据二分查找的失败条件,找到对应的子树,继续查找

插入

只能插入到叶节点中,如果为非叶节点,从根开始查找,直到查找到叶子节点,然后在叶子节点中插入

删除

删除规则如下:

  • 如果关键字k在节点x中
    • 如果x是叶节点
      • 若x中的关键字个数>t-1,则删除k
      • 若关键字个数=t-1
        • 左右兄弟树的关键字个数>=t,则向相邻的节点树借一个节点,然后调整父母与子树的关系
        • 若左右兄弟不够借,则合并左右兄弟和父母节点,成为新的子树
    • 如果x不是叶节点,则做以下操作
      • a:如果x节点中,前于的子树节点的关键字个数>t-1,则找出子树中的最大值,然后递归删除(使用删除规则)
      • b: 如上,如果后于的子树节点的关键字个数>t-1,则找出子树的最小值,然后递归删除
      • c:如果不满足a和b,且的关键字个数=t-1,则合并两棵子树
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
// C++ program for B-Tree insertion
#include<iostream>
using namespace std;

// A BTree node
class BTreeNode
{
int t; // Minimum degree (defines the range for number of keys)
int *keys; // An array of keys, the size is 2 * t - 1
BTreeNode **C; // An array of child pointers, the size is 2 * t
int n; // Current number of keys
bool leaf; // Is true when node is leaf. Otherwise false
public:
BTreeNode(int _t, bool _leaf); // Constructor

// A utility function to insert a new key in the subtree rooted with
// this node. The assumption is, the node must be non-full when this
// function is called
void insertNonFull(int k);

// A utility function to split the child y of this node. i is index of y in
// child array C[]. The Child y must be full when this function is called
void splitChild(int i, BTreeNode *y);

// A function to traverse all nodes in a subtree rooted with this node
void traverse();

// A function to search a key in subtree rooted with this node.
BTreeNode *search(int k); // returns NULL if k is not present.

// Make BTree friend of this so that we can access private members of this
// class in BTree functions
friend class BTree;
};

// A BTree
class BTree
{
BTreeNode *root; // Pointer to root node
int t; // Minimum degree
public:
// Constructor (Initializes tree as empty)
BTree(int _t)
{ root = NULL; t = _t; }

// function to traverse the tree
void traverse()
{ if (root != NULL) root->traverse(); }

// function to search a key in this tree
BTreeNode* search(int k)
{ return (root == NULL)? NULL : root->search(k); }

// The main function that inserts a new key in this B-Tree
void insert(int k);
};

// Constructor for BTreeNode class
BTreeNode::BTreeNode(int t1, bool leaf1)
{
// Copy the given minimum degree and leaf property
t = t1;
leaf = leaf1;

// Allocate memory for maximum number of possible keys
// and child pointers
keys = new int[2*t-1];
C = new BTreeNode *[2*t];

// Initialize the number of keys as 0
n = 0;
}

// Function to traverse all nodes in a subtree rooted with this node
void BTreeNode::traverse()
{
// There are n keys and n+1 children, travers through n keys
// and first n children
int i;
for (i = 0; i < n; i++)
{
// If this is not leaf, then before printing key[i],
// traverse the subtree rooted with child C[i].
if (leaf == false)
C[i]->traverse();
cout << " " << keys[i];
}

// Print the subtree rooted with last child
if (leaf == false)
C[i]->traverse();
}

// Function to search key k in subtree rooted with this node
BTreeNode *BTreeNode::search(int k)
{
// Find the first key greater than or equal to k
int i = 0;
while (i < n && k > keys[i])
i++;

// If the found key is equal to k, return this node
if (keys[i] == k)
return this;

// If key is not found here and this is a leaf node
if (leaf == true)
return NULL;

// Go to the appropriate child
return C[i]->search(k);
}

// The main function that inserts a new key in this B-Tree
void BTree::insert(int k)
{
// If tree is empty
if (root == NULL)
{
// Allocate memory for root
root = new BTreeNode(t, true);
root->keys[0] = k; // Insert key
root->n = 1; // Update number of keys in root
}
else // If tree is not empty
{
// If root is full, then tree grows in height
if (root->n == 2*t-1)
{
// Allocate memory for new root
BTreeNode *s = new BTreeNode(t, false);

// Make old root as child of new root
s->C[0] = root;

// Split the old root and move 1 key to the new root
s->splitChild(0, root);

// New root has two children now. Decide which of the
// two children is going to have new key
// if the key less than root.key[0], insert the key to left child
// if the key gather than root.key[0], insert the key to right child
int i = 0;
if (s->keys[0] < k)
i++;
s->C[i]->insertNonFull(k);

// Change root
root = s;
}
else // If root is not full, call insertNonFull for root
root->insertNonFull(k);
}
}

// A utility function to insert a new key in this node
// The assumption is, the node must be non-full when this
// function is called
void BTreeNode::insertNonFull(int k)
{
// Initialize index as index of rightmost element
int i = n-1;

// If this is a leaf node
if (leaf == true)
{
// The following loop does two things
// a) Finds the location of new key to be inserted
// b) Moves all greater keys to one place ahead
while (i >= 0 && keys[i] > k)
{
keys[i+1] = keys[i];
i--;
}

// Insert the new key at found location
keys[i+1] = k;
n = n+1;
}
else // If this node is not leaf
{
// Find the child which is going to have the new key
while (i >= 0 && keys[i] > k)
i--;

// See if the found child is full
if (C[i+1]->n == 2*t-1)
{
// If the child is full, then split it
splitChild(i+1, C[i+1]);

// After split, the middle key of C[i] goes up and
// C[i] is splitted into two. See which of the two
// is going to have the new key
if (keys[i+1] < k)
i++;
}
C[i+1]->insertNonFull(k);
}
}

// A utility function to split the child y of this node
// Note that y must be full when this function is called
void BTreeNode::splitChild(int i, BTreeNode *y)
{
// Create a new node which is going to store (t-1) keys
// of y
BTreeNode *z = new BTreeNode(y->t, y->leaf);
z->n = t - 1;

// Copy the last (t-1) keys of y to z
for (int j = 0; j < t-1; j++)
z->keys[j] = y->keys[j+t];

// Copy the last t children of y to z
if (y->leaf == false)
{
for (int j = 0; j < t; j++)
z->C[j] = y->C[j+t];
}

// Reduce the number of keys in y
y->n = t - 1;

// Since this node is going to have a new child,
// create space of new child
for (int j = n; j >= i+1; j--)
C[j+1] = C[j];

// Link the new child to this node
C[i+1] = z;

// A key of y will move to this node. Find location of
// new key and move all greater keys one space ahead
for (int j = n-1; j >= i; j--)
keys[j+1] = keys[j];

// Copy the middle key of y to this node
keys[i] = y->keys[t-1];

// Increment count of keys in this node
n = n + 1;
}

// Driver program to test above functions
int main()
{
BTree t(3); // A B-Tree with minium degree 3
t.insert(10);
t.insert(20);
t.insert(5);
t.insert(6);
t.insert(12);
t.insert(30);
t.insert(7);
t.insert(17);

cout << "Traversal of the constucted tree is ";
t.traverse();

int k = 6;
(t.search(k) != NULL)? cout << "\nPresent" : cout << "\nNot Present";

k = 15;
(t.search(k) != NULL)? cout << "\nPresent" : cout << "\nNot Present";

return 0;
}

散列表

散列函数的构造方法:

  • 除留余数法:假设hash map长度为m,取一个质数p<=m,index=key % p
  • 数字分析法:
  • 直接定址法:取线性函数,index=a*key + b
  • 平方取中法:
  • 折叠法

处理冲突的方法

开放定址法

公式为

  • 线性探测法:$d_i=1,2,3,….m-1$,当冲突发生,寻找临近的单元,直到遍历整个表,这样会造成大量元素堆积
  • 平方探测法:取$d_i=1^2,2^2,3^2,….(m/2)^2$,m必须是一个能表示成4k*3的质数
  • 再散列法:发生冲突,在对其进行哈希
  • 伪随机序列法:使用伪随机数
    • 拉链法 :index对应链表,把所有index相同的值用链表存储起来,索引的时候找到链表然后再进行查找

字符串匹配

1.模式匹配

找到开头的字符串,然后再用一个指针继续匹配,匹配失败则返回开头,继续尝试寻找下一个开头

2.kmp

通过目标字符串构建状态数组

排序算法

算法的稳定性:如果两个相同的元素R1,R2,当排序之后不改变位置,那么则称排序稳定,否则称其为不稳定的

冒泡排序

冒泡算法让大的值下沉,小的值上升/小的值下沉,大的值上升

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

#include <iostream>
void bubble_sort(int a[], int length)
{
if (length <= 0) return;
for (int i = 0; i < length; i++)
{
for (int j = 0; j < length - 1; j++)
{
if (a[j] > a[j+1])
{
int temp = a[j];
a[j] = a[j+1];
a[j+1] = temp;
}
}
}
}

int main()
{
int a[5] = {1,8,0,4,5};
bubble_sort(a, 5);
for (int i : a)
{
std::cout << i << std::endl;
}
return 0;
}

插入排序

插入排序将当前元素向左移动到合适的位置,此算法有两个循环

循环1:规定范围

循环2:移动数据

直接插入排序:边比较边移动

折半插入排序:先找到合适的位置,然后再移动

希尔排序

分治法,把数组分为多个,部分排序,然后整体排序

选择排序

选择最小的,移动到前面

堆排序

在构建堆时候,不断下沉

在插入堆的时候,上浮

快速排序

快速排序也属于分治的排序算法

先切分,切分的数据以切分的元素为界,左小右大/左大右小,然后对左右的切分数组再切分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>

int array[5] = {5,2,7,4,1};

void exch(int i, int h)
{
int temp = array[i];
array[i] = array[h];
array[h] = temp;
}

//base on array[lo], split the array, [ array[n] < array[lo] ---- array[lo] ----- array[n] > array[lo]
int split(int lo, int hi)
{
int i = lo;
int j = hi + 1;

while(true)
{
//search from left, get the index of element which greater than the array[lo]
while (array[++i] < array[lo])
if (i == hi) break;

//search from right, get the index of element which less than the array[lo]
while (array[--j] > array[lo])
if (j == lo) break;

if (i >= j) break;

//exchange
exch(i,j);
}
exch(lo, j);
return j;
}

//split
void quick_sort(int lo, int hi)
{
if (hi <= lo) return;
int j = split(lo, hi);

//split left sub array
quick_sort(lo, j-1);

//split right sub array
quick_sort(j+1, hi);
}

int main()
{
quick_sort(0, 4);
for (int i: array)
std::cout << "i:" << i << std::endl;
return 0;
}