Deterministic metric -median selection with very few queries
111Part of this paper appears in Proceedings of the 27th International
Computing and Combinatorics
Conference (COCOONΒ 2021).
Ching-Lueh Chang222Department of Computer Science and
Engineering,
Yuan Ze University, Taoyuan, Taiwan.
clchang@saturn.yzu.edu.tw
Abstract
Given an -point metric space ,
metric -median asks for a point minimizing
.
We show that for each computable function
satisfying ,
metric -median has a deterministic, -query,
-approximation and nonadaptive algorithm.
Previously, no deterministic -query -approximation
algorithms are known for metric -median.
On the negative side,
we
prove each
deterministic
-query
algorithm
for
metric -median
to be not -approximate for a sufficiently small constant .
We also refute the existence of
deterministic -query -approximation algorithms.
An -point metric space is a size- set endowed with a distance
function such that
β’
if and only if ,
β’
, and
β’
(triangle inequality)
for all , , Β [16].
Metric -median asks for a point
minimizing .
Clearly, it has a brute-force -time algorithm.
Furthermore, it generalizes the classical median selectionΒ [6]
and can be generalized further to metric -median clustering.
In social network analysis, metric -median
asks for an actor with the maximum closeness centralityΒ [17].
For all , a -approximate -median of
is a point satisfying
.
By convention, a -approximation algorithm for
metric -median must output a
-approximate -median of .
A query inspects for some , .
An algorithm is nonadaptive if
its
th query
is independent of the
answers to the
first queries, for all .
Write for the distance function induced by an undirected graph .
IndykΒ [11, 12] gives a Monte Carlo -time
-approximation
algorithm for metric -median, where .
His time complexity is optimal w.r.t.Β .
When restricted to ,
metric -median has a Monte Carlo -time
-approximation algorithmΒ [14].
The more general
-median clustering in metric spaces has streaming approximation
algorithmsΒ [10],
requires time for -approximationsΒ [15]
and is inapproximable to within
unless
Β [13].
For
and graph metrics,
a well-studied
problem is to find
the average distance from a query point to a finite set of
pointsΒ [1, 8, 9].
Deterministic
-query
computation is almost completely understood
for metric -median:
For all constants ,
the
best approximation
ratio achievable by deterministic
-query
and
-query algorithms is
and
,
respectivelyΒ [2, 4, 18].
The same holds with βqueryβ replaced by βtimeβ and
regardless of whether the algorithms can be adaptiveΒ [2, 4].
In contrast, we
study the largely unknown
deterministic
- or
-query
computation.
An
-query
algorithm
enjoys the strength of ignoring
a
fraction of
points.
It is folklore that every point is an -approximate -median.
Surprisingly, this is
the
current
best upper bound
for
deterministic -query
algorithms.
In particular, no deterministic -query -approximation algorithms
are known for metric -median.
Instead, we
give
a deterministic, -query,
-approximation and nonadaptive algorithm for each
computable function satisfying
.
So, e.g., metric -median has a deterministic -query
-approximation algorithm for the very slowly growing
inverse Ackermann function .
Our main technical discovery
is that a -approximate -median of
(where denotes
restricted to ) is an -approximate
-median of , for all and .
When is a uniformly random set of a sufficiently large size,
an approximate solution to metric -median clustering
for
is
a
good
one
for with high probabilityΒ [7].
But our discovery is for any
and is new.
ChangΒ [3]
shows that metric -median has a deterministic,
-time, -query,
-approximation and nonadaptive algorithm, for all .
So deterministic -query algorithms can be -approximate
for each .
Currently, the best lower bound against deterministic -query
algorithms is that they cannot be -approximateΒ [4].
So there is a huge gap between
ChangβsΒ [3]
approximation ratio of and the current best lower bound.
We close the gap by
showing
each
deterministic
-query algorithm
for metric -median
to be not
-approximate
for a sufficiently small constant (depending on the algorithm).
Our approach, sketched below, adversarially answers the queries of a deterministic
-query algorithm Alg:
(I)
Start with the complete graph on .
(II)
Mark all edges in an -regular
expander graph as permanent.
(III)
Repeat the following:
(1)
Upon receiving a query , find a shortest - path
and answer by the length of .
(2)
Mark all edges of as permanent.
(3)
For each vertex incident to too many permanent edges,
remove all non-permanent edges incident to .
Intuitively,
itemΒ (III3) keeps degrees small, thus forcing the output
of
Alg to have a large average distance to other points.
Because itemΒ (III1) answers a query by the length of ,
itemsΒ (III2)β(III3)
must preserve all edge of (by marking them as permanent and not removing
them) for the consistency in answering future queries.
ItemsΒ (I)Β andΒ (III1)β(III3)
follow ChangβsΒ [4] paradigm.
To prove a lower bound against Alg, we shall make the output of Alg
a lot worse than a -median, presumably
by identifying or planting
a vertex with a sufficiently small average distance
to other points.
However, Chang fails in this respect.
We overcome his problem by
itemΒ (II), which allows a vertex to have an
average distance to other vertices.
An extension of our
lower bound
forbids
each deterministic -query algorithm
for
metric -median
to be -approximate
for some computable function
satisfying
.
In particular,
deterministic -query -approximation algorithms
do not exist.
Previously, the best lower bound against deterministic
-query algorithms
is folklore and
forbids
to be
-approximate for some .333For
a sketch of proof, answer all queries of by and put all points not
involved in the queries to be
extremely close to one another
but
extremely far away from βs output and from the points involved in the queries.
So previous works
do not yet refute
the existence of
deterministic
-query
-approximation
algorithms,
where is the very slowly growing inverse
Ackermann function.
ChangΒ [5]βs adversarial method shows that metric -median
has no deterministic -query
-approximation
algorithms that make each point involve in
queries to .
But his adversary is rather naΓ―ve and does not seem to yield any
unconditional lower bound such as ours.
2 Upper bound
Take an -point metric space and .
Define
to be a -median of and , respectively,
breaking ties arbitrarily.
Furthermore, pick and independently and uniformly
at random from .
So
is the average distance in .
Lemma 1.
Proof.
We have
β
Lemma 2.
Proof.
By the optimality of ,
Clearly,
β
For all ,
(1)
The next two lemmas constitute our main discovery.
Lemma 3.
For all and satisfying
and
,
is an -approximate -median of .
For all constants ,
metric -median has a deterministic,
-time,
-query,
-approximation and
nonadaptive algorithm.
Below is our main theorem.
Theorem 7.
For each computable function
satisfying
,
metric -median has a deterministic,
-query, -approximation and
nonadaptive algorithm.
Proof.
Take any
of size .
Applying TheoremΒ 6 to ,
an -approximate -median of
can be found deterministically and nonadaptively with
queries.
By LemmaΒ 5 (with ),
is an -approximate -median of .
β
Taking a very slowly growing (e.g., the iterated
logarithm
or the inverse
Ackermann function),
TheoremΒ 7 allows deterministic -query algorithms
to be very close to being -approximate.
3 Lower bound
Fix any deterministic -query algorithm Alg, where .
Then take a constant , where is
such that -regular expander graphs exist.
By padding, assume the number of Algβs queries to be exactly .
Adversary Adv in Fig.Β 1 answers the queries of Alg.
All graphs are assumed to be undirected.
As a remark,
whenever
an edge of a graph
is marked as permanent,
that edge is considered to be permanent in all graphs.
For example, an edge of marked as permanent
in lineΒ 3 of Adv
is considered to be permanent in linesΒ 11β13, even though the latter
processes rather than .
Similarly,
although
an edge marked as permanent by
lineΒ 8 comes from by lineΒ 6,
it is considered to be permanent
in linesΒ 11β13 as well.
Lemma 8.
For all ,
is a subgraph of .
Proof.
By lineΒ 1,
is a subgraph of .
Assume as induction hypothesis that
is a subgraph of .
By lineΒ 3 and
the induction hypothesis,
all edges of are permanent edges of .
By
linesΒ 9β14, all permanent edges of are in .
β
For all ,
Advβs answer to the th query of Alg equals .
Proof (included for completeness).
Let be Advβs answer to the th query.
By linesΒ 6β7, .444As
is an expander, by LemmaΒ 8.
By linesΒ 9β14, is a subgraph of , implying .
In summary, .
By lineΒ 7, is the length of .
As is in by lineΒ 6, all edges of are permanent edges
of by linesΒ 8β14.
So by linesΒ 9β14, exists in for all .555Note that
once an edge is marked as permanent, it cannot be removed by lineΒ 12.
Therefore,
the length of is at least
(in fact, at least for all ).
In summary,
.
β
For each ,
each run of lineΒ 8
marks as permanent at most two edges incident to .
Proof (included for completeness).
In lineΒ 6, has at most two edges incident to .
β
Let be the set of edges
ever
marked as permanent,
and .
Denote by the output of Alg with all queries answered by Adv.
By padding dummy queries,
assume without loss of generality that Alg queries for the distance
between and each point in .
By linesΒ 7β8, Adv answers each query of Alg
by the length of a path
whose edges are all
in
.
So for all ,
the answer to the th query is at least .
Therefore,
by LemmaΒ 9, where .
This and the assumption that
Alg queries for
all distances between and the points in give
(5)
Consider the
instant
when the number of permanent
edges incident to a vertex exceeds .
By LemmaΒ 10,
is incident to at most permanent edges at time .
Then linesΒ 9β14 remove
from all non-permanent edges incident to (and will not put them
back to for any ).
So no more edges incident to will be marked as permanent after time .
In summary,
has degree at most in .
In the above argument, can be any vertex
whose number of incident permanent edges ever exceeds .
So
has maximum degree at most .666Clearly,
a vertex
whose number of incident permanent edges never exceeds
will have degree in .
So for
all
,
at most vertices in
can be
within distance (inclusive) from .
Taking for a small constant depending on ,
.
I.e.,
at least
vertices
are of distance greater than from in .
So
For all and when lineΒ 6 picks ,
has at most one non-permanent edge.
Proof (included for completeness).
Write .
Assume for contradiction that and
are both non-permanent when lineΒ 6 picks
from ,
for some .
By lineΒ 1, has the edge .
But by the optimality of in lineΒ 6,
cannot have the edge .
So there exists such that
lineΒ 12
runs with
in
the th iteration of the loop in linesΒ 4β15.777Let
be the smallest index such that does not have
.
LineΒ 9 initializes to be , which has
.
So
lineΒ 12 must remove
from .
This happens only
by running lineΒ 12 with .
Being non-permanent when lineΒ 6 picks
from ,
and
must
have
remained
non-permanent
throughout the first
iterations (including the th iteration)
of the loop in linesΒ 4β15 (because of the irreversibility of permanence).
Therefore,
when
lineΒ 12
runs
with
in
the th iteration of the loop in linesΒ 4β15,
or must be removed from .
By symmetry, assume to not have .
By linesΒ 9β14 and as ,
cannot have , either.
As
is picked from by lineΒ 6,
must
have (which is on ), a contradiction.
β
As is -regular by lineΒ 2, lineΒ 3 marks
edges as permanent by the handshaking lemma.
By CorollaryΒ 15,
at most
edges
are ever marked
as permanent by lineΒ 8.
To sum up,
has
at most edges.
So by the handshaking lemma, the average degree in
is at most .
This and Markovβs inequality imply that
at most vertices have degrees
at least
in .
As , at most vertices have degrees
at least
in .
β
Before linesΒ 6,Β 10Β andΒ 17, cnt increments.
β
Lemma 20.
When Sim terminates,
, , for all .
Proof.
Each query increases cnt by at most two in linesΒ 4β11.
LinesΒ 15β18 may also increase cnt.
LinesΒ 6,Β 10,Β andΒ 17 set to be cnt for some .
β
Lemma 21.
If Alg is -approximate for metric -median, where
, then
Sim is a tame -query
-approximation algorithm for metric -median.
Proof.
By LemmaΒ 19, Sim simulates Alg with an injective
renaming of points.
So, inheriting from Alg, Sim is -approximate and makes queries.
By LemmaΒ 20 and linesΒ 12Β andΒ 19 of Sim,
Sim is tame.
β
Each deterministic -query algorithm for
Metric -median fails to be
-approximate for some computable function
satisfying
.
Proof.
By LemmaΒ 21, assume Alg to be tame
without loss of generality (otherwise, prove the theorem
against Sim instead
of Alg).
Let the Algβs output when the queries are answered by
Adv
with
(resp., )
substituted by (resp., ).
By LemmaΒ 11 with (resp., )
substituted by (resp., ),
(8)
where
is a graph on as in
Adv.
By LemmasΒ 16β17
with (resp., ) substituted by (resp., ),
there exists satisfying
(9)
EquationsΒ (8)β(9)
and the triangle inequality
imply
(10)
Recall that .
Put all points in extremely close to :
For
all distinct , ,
and
(15)
It is not hard to see that is induced by the weighted graph obtained in the following way:
(1)Β Add all vertices in to .
(2)Β Add an edge between each and each neighbor (in )
of .
(3)Β Connect any two vertices in
by an edge of weight , all other edge weights being .
As Alg is tame, for all ,
implying by
equationΒ (15).
So by LemmaΒ 9,
Adv answers queries consistently with .
We have
(17)
As Alg is tame, .
By equationΒ (10), .888For
proving the theorem, we
may assume without loss of generality.
So is nonzero.
So .
Now,
This and
equationsΒ (3.1)β(17)
show
to be no better than
-approximate for some constant .
Clearly, .
So taking completes the proof
except that may be uncomputable.
Gladly,
has codomain
by equationΒ (15).999Any graph on a subset of induces distances
in .
But equationsΒ (3.1)β(17) forbid
as a distance.
So we may pretend as if is Algβs worst-case query complexity
w.r.t.Β metrics with codomain .
This makes , and thus , computable.
β
Corollary 23.
Metric -median has no deterministic -query
-approximation algorithms.
Metric -median has no
deterministic -query
algorithms
with an asymptotically best approximation ratio.
Proof.
Take any deterministic -query
algorithm .
By TheoremΒ 22,
there exists a computable
forbidding to be -approximate.
But TheoremΒ 7
asserts the existence of a
deterministic -query
-approximation
algorithm.
β
Appendix A Distances in expanders
It is well-known that an -regular expander graph
on
exists.
I.e., there exist
constants and
such that
(i)
is -regular, and
(ii)
for each
of size at most ,
at least edges of
are in .
Lemma 25.
For each nonempty of size at most ,
Proof.
For
each ,
So
is the set of vertices at level
of the BFS tree rooted at .101010Generalize BFS in the obvious
way to allow the root to be a set of vertices.
Now fix any .
Because edges cannot cross
non-adjacent levels of a BFS tree,
.
By itemΒ (ii) (with replaced by and noting
that has size at most ),
at least
edges of
are in .
In summary,
at least edges are in
(and are thus incident to a vertex in ).
As is -regular, therefore, .
Hence
where the last equality uses the convergence of
.
β
Appendix B Acknowledgments
The author is supported
by the Ministry of Science and Technology of Taiwan under
grant 110-2221-E-155-012-.
References
[1]
P.Β Bose, A.Β Maheshwari, and P.Β Morin.
Fast approximations for sums of distances, clustering and the
FermatβWeber problem.
Computational Geometry, 24(3):135β146, 2003.
[2]
C.-L. Chang.
A lower bound for metric -median selection.
Journal of Computer and System Sciences, 84:44β51, 2017.
[3]
C.-L. Chang.
Metric -median selection with fewer queries.
In Proceedings of the 2017 International Conference on Applied
System Innovation, pages 1056β1059, 2017.
[5]
C.-L. Chang.
A note on metric -median selection.
In Proceedings of the 23rd International Computer Symposium,
pages 457β459, Yunlin, Taiwan, 2018.
[6]
T.Β H. Cormen, C.Β E. Leiserson, R.Β L. Rivest, and C.Β Stein.
Introduction to Algorithms.
The MIT Press, 3rd edition, 2001.
[7]
A.Β Czumaj and C.Β Sohler.
Sublinear-time approximation algorithms for clustering via random
sampling.
Random Structures & Algorithms, 30(1β2):226β256, 2007.
[8]
D.Β Eppstein and J.Β Wang.
Fast approximation of centrality.
Journal of Graph Algorithms and Applications, 8(1):39β45,
2004.
[9]
O.Β Goldreich and D.Β Ron.
Approximating average parameters of graphs.
Random Structures & Algorithms, 32(4):473β493, 2008.
[10]
S.Β Guha, A.Β Meyerson, N.Β Mishra, R.Β Motwani, and L.Β OβCallaghan.
Clustering data streams: Theory and practice.
IEEE Transactions on Knowledge and Data Engineering,
15(3):515β528, 2003.
[11]
P.Β Indyk.
Sublinear time algorithms for metric space problems.
In Proceedings of the 31st Annual ACM Symposium on Theory of
Computing, pages 428β434, 1999.
[13]
K.Β Jain, M.Β Mahdian, and A.Β Saberi.
A new greedy approach for facility location problems.
In Proceedings of the 34th Annual ACM Symposium on Theory of
Computing, pages 731β740, 2002.
[14]
A.Β Kumar, Y.Β Sabharwal, and S.Β Sen.
Linear-time approximation schemes for clustering problems in any
dimensions.
Journal of the ACM, 57(2):5, 2010.
[15]
R.Β R. Mettu and C.Β G. Plaxton.
Optimal time bounds for approximate clustering.
Machine Learning, 56(1β3):35β60, 2004.