English version of this page
На главную страницу
Официальный сайт кафедры Математической теории интеллектуальных систем и лабораторий Проблем теоретической кибернетики и Математичеких методов искусственного интеллекта механико-математического факультета МГУ им. М. В. Ломоносова

Сотрудники :: Гасанов Эльяр Эльдарович :: Публикации Гасанова Э.Э.

Instantly solvable search problems

Eljar E. Gasanov
Moscow State University



This paper deals with the investigation of search time in database. The search time is very important parameter for many information systems (for example, the real-time systems). Thus, the problem of construction of algorithms minimizing the search time for the concrete information system is very actual. The new very fast on the average algorithms are suggested in the paper. These algorithms solve some geometric search problems. The estimations for the time complexity of these algorithms are obtained.




Database theory; information search complexity; information search modelling; fast algorithms.




There are two approaches to construction of algorithms solving information search problems. The first approach supposes the creation of universal algorithms having wide range of application. Usually this universality leads to some loss of efficiency. The second approach supposes the creation of effective algorithms but these algorithms can be used in a narrow range. Both approaches have a right to existence. Usually the universal algorithms of organization and search of data are used in the database systems, intended for extensive commercial application. When the efficiency of the universal algorithms do not satisfies in some special databases then the necessity of construction of special data organization and search algorithms appears. On the other hand availability of the great number of effective algorithms, which can be used in the various situations, can allow to realize the third approach. This approach consists in the creation of an expert system, which can help for each search problem to find suitable effective algorithms from available ones in the expert system database. Thus, the third approach unites advantages of the first and second ones: on the one hand it can ensure applicability for a wide class of problems but on the other hand it can ensure high efficiency. Perspective of the third approach increases the importance of effective algorithms construction in spite of the narrowness of their application.


In this paper the search algorithms efficiency means the average search time. The average search time is very important parameter for all real-time systems. Therefore the problem of construction of fast search algorithms is very actual. Two new algorithms for solving two geometric search problems are suggested in the paper. These algorithms can be very useful for computer-aided design and computer graphics systems.


Data is analysed and pre-processed before using the data by search algorithm. These steps are doing to achieve the optimal time of the search.




The following mathematical model of information search problem is suggested in the paper .


Let $X$ and $Y$ be some sets. Call $X$ a set of queries and $Y$ a set of objects to be found (or required objects). Let $\rho $ be a relation on $X\times Y$, which we call search relation. Say a required object $y\in Y$ satisfies query $x\in X$ if $x\rho y$. Let $V$ be some finite subset of $Y$.


Call triplet $I=\langle X,V,\rho\rangle $ an information search problem (ISP). ISP $I=\langle X,V,\rho\rangle $ consists in enumeration for any query $x\in X$ all that and only that objects $y\in V$, such that $x\rho y$.


Let $\langle X,\sigma ,{\bf P}\rangle $ be a probability space under $X$, where $\sigma $ is the algebra of subsets of $X$, ${\bf P}$ is probability measure on $\sigma $.


Let $A$ be some algorithm, which solves ISP $I$.


$T(A,x)$ is solving time of $I$ by $A$ on query $x$.


$T(A)={\bf M}\, T(A,x)$ – average solving time of $I$.


$T(I)=\inf T(A)$, where $\inf$ is taken on the set of all algorithms, solving problem $I$.


A new mathematical model of the search algorithm was introduced by the author (Gasanov, 1991). The notions of algorithm, which solves ISP $I$, and values $T(A,x)$ and $T(A)$ can be formal defined in this model. But we do not describe this model because the limit of space do not allow to do it. We notice only that some schemes are corresponded to the search algorithms, and these schemes are directed graphs. The required objects from $Y$ are corresponded to some nodes of this graph, and some special functions are corresponded to edges or nodes of the graph. The range of definition of these functions is the set of queries X. The complexity of these schemes is defined so that it reflects the average search time of correspondent algorithms.


Let $O(y,\rho )=\{ x\in X: x\rho y \}.$


The following theorems are obtained.


Theorem 1   For any ISP $I=\langle X,V,\rho\rangle $

\begin{displaymath}T(I)\ge \sum_{y\in V} {\bf P}(O(y,\rho )).\end{displaymath}

Say $I=\langle X,V,\rho\rangle $ is an instantly solvable search problem if there exist algorithm $A$, solving $I$, such that

\begin{displaymath}T(I)\le \sum_{y\in V} {\bf P}(O(y)) + c,\end{displaymath}

where $c$ is a constant, independent of number of elements of set V.


Theorem 2   Let $X=(0,1]$, $Y=(0,1]$, $V=\{y_1,\ldots ,y_k\}\subseteq Y$, $\rho : x \rho y \Longleftrightarrow x=y.$ Let for each $B\subseteq X\ $ ${\bf P}(B)=\int_B p(x)dx$, where $p(x)$ is probability density function, and $p(x)\le c=constant$. Then ISP $I=\langle X,V,\rho\rangle $ (which is called problem of search of identical objects) is instantly solvable search problem, and

\begin{displaymath}1 < T(I) < 2.\end{displaymath}


Theorem 3   Let $Y={[0,1]}^n$, $X=\{x=(u_1, v_1, \ldots , u_n, v_n): 0 < u_i\le v_i\le 1,
i=\overline{1,n}\}$, $V=\{y_1,\ldots ,y_k\}\subseteq Y$, $\rho $ – relation on $X\times Y$ such that for $x=(u_1,v_1,\ldots ,u_n,v_n)\in X$ and $y=(y_1,\ldots ,y_n)\in Y\ $ $x\rho y
\Longleftrightarrow u_i\leq y_i \leq v_i,\ i=\overline{1,n}$. Let for each $B\subseteq X\ $ ${\bf P}(B)=\int_B p(x)dx$, where $p(x)$ is probability density function, and $p(x)\le c=constant$. Then ISP $I=\langle X,V,\rho\rangle $ (which is called $n$-dimensional interval search problem) is instantly solvable search problem, and

\begin{displaymath}\sum_{y\in V} {\bf P}(O(y,\rho )) \le T(I)
\le \sum_{y\in V} {\bf P}(O(y,\rho )) +4n+1.\end{displaymath}




Since we do not describe the mathematical model of the search algorithms we cannot give the formal proof of the above theorems. Therefore we give only the proof ideas and algorithms description.


The first theorem say that the average search time is not less than the average time of the answer enumeration. It is the evident and well known fact.


The result of the second theorem is founded on the following search algorithm.


So $V=\{ y_1,\ldots ,y_k\}$ is our ordered set of the objects to be found.


Let $S_i=\left(\frac{i-1}{k},\frac{i}{k}\right]$, $i=\overline{1,k}$.


Let $V_i=S_i\cap V,\ i=\overline{1,k}$.


Let $x\in (0,1]$ be some query.


On the first step we calculate the number $m=[x\cdot k]$. If $x\in V$ then $x\in V_m$. Therefore on the second step we do binary search in the set $V_m$ (if $V_m \neq \emptyset$). The estimation of the theorem 2 is based on the fact that the worst on the average situations is the case when $\vert V_1\vert=\vert V_2\vert=\cdots =\vert V_k\vert=1$. The last fact follows from the concavity of the logarithm function.


When $n=1$ the result of the third theorem is grounded on the following search algorithm.


Let $V=\{ y_1,\ldots ,y_k\}$ is our increasing ordered set of the objects to be found.


Let $m=[2\,c\cdot \log k]$.


Let $S=\{s_1,\ldots ,s_m\}$, where $s_i=\min \{j: y_j\geq
\frac{i}{m+1}\}$ (if $\{j: y_j\geq
\frac{i}{m+1}\}=\emptyset$ then $s_i=0$).


Let $x=(u,v)$ be some query from $X$.


On the first step we calculate the value $t=v-u$.


If $t< \frac{1}{m+1}$ we find $y'=\min \{y\in V: y\geq u\}$ with the help of the binary search. Then we look through the objects $y\in V$ from left to right beginning from $y'$ while $y\leq v$.


If $t\geq \frac{1}{m+1}$ we calculate the value $p=[u\cdot (m+1)]$. Obviously $\frac{n}{m+1}\in [u,v]$. Therefore we will look through the objects $y_j\in V$ from left to right beginning from $y_{s_p}$ while $y_j\leq v$ (we will do it only if $s_p\neq 0$). Then we will look through the objects $y_j\in V$ from right to left beginning from $y_{s_p -1}$ while $y_j\geq u$ (if $s_p = 0$ we will begin from $y_k$).


Finally notice that ${\bf P}(v-u\leq \frac{1}{m+1}) <
\frac{1}{\log k}$ whence it follows the result of the theorem 3 under condition $n=1$.


The case $n>1$ is proved by the method of dimension reduction.




The above-mentioned search problems are well known and widespread geometric search problems. They are used in the all computer graphics systems and CAD. For example, every time we are looking for points located in some window we deal with the 2-dimensional interval search problem.


The first above search problem is typical representative of problem of search of identical objects, which occur in the all database systems. Under certain conditions above algorithms can be used easily for solving of the well known key search problem. Therefore the results of this paper can be used in the large class of information search systems when the search time is critical parameter.




Gasanov E.E. (1991); On one mathematical model of information search; Discrete mathematics, v. 3, No 2 (pp. 69-76) (in Russian).

Proceedings of International Symposium on Intelligent Data Analysis (IDA-95). Baden-Baden, Germany. HAS Press, 1995.