HyperAIHyperAI

Command Palette

Search for a command to run...

3年前

分割凝集アルゴリズムおよび自動分類問題の複雑性

Alexander Rubchinsky

分類のための複雑性と学習曲線の分析

RTX 5090のコンピュートリソースがわずか20時間分 $1 (価値 $7)
ノートブックへ移動

概要

本論文では、自動分類(以下、簡略化してAC)問題の解法アルゴリズムを提示する。AC問題では、与えられたパターン行列、あるいは非類似度/類似度行列から出発して、1つまたは複数の分割(パーティション)を見つけることが要求される。本アルゴリズムの3段階の枠組みを提案する。内部レベルでは、頻度最小最大二分法アルゴリズムを記述する。中間レベルでは、このアルゴリズムを分割段階と凝集段階の交互実行において繰り返し用いることにより、分類のファミリーを構築する。外部レベルでは、中間レベルのアルゴリズムを複数回実行し、その後、構築されたすべての分類ファミリーの中から、互いに異なるすべての分類の集合を選択する。この最終的な集合を、与えられたAC問題のすべての解の集合として採用する。多くの場合、この解の集合は大幅に縮約可能であり(場合によっては1つの分類にまで縮約される)、初期のAC問題の複雑さの尺度として、外部レベルで見つかったすべての分類の集合の濃度に対する解の集合の濃度の比を用いる。議員の投票結果に基づく分類の場合、複雑さの一般概念は、この議会の政策の一貫性または合理性として解釈される。

One-sentence Summary

Alexander Rubchinsky presents a three-level divisive-agglomerative algorithm for automatic classification that repeatedly applies a frequency minimax dichotomy routine across alternating divisive and agglomerative stages to construct classification families, defines problem complexity as the ratio of distinct solution classifications to the total number of generated partitions, and applies this metric to evaluate parliamentary voting consistency and rationality.

Key Contributions

  • A three-level algorithmic scheme is proposed for the Automatic Classification problem to systematically construct a family of partitions. The framework applies a frequency minimax dichotomy algorithm at the internal level, alternates divisive and agglomerative stages at the intermediate level, and selects unique classifications across multiple external runs.
  • A formal complexity metric is defined as the ratio between the cardinality of the final unique solution set and the total number of classifications generated during the external search phase. This measure quantifies the inherent difficulty of the Automatic Classification problem based on the constructed solution family.
  • The algorithm is applied to voting records from the second, third, and fourth RF Dumas to analyze legislative activity. The computed complexity index is interpreted as a measure of policy consistency and rationality within these parliamentary bodies.

Introduction

Automatic classification serves as a critical tool for organizing complex data, yet its real-world deployment is frequently complicated by unpredictable problem difficulty. Practitioners often encounter ambiguous solutions, non-unique groupings, and severe computational bottlenecks when scaling to high-dimensional datasets. Previous studies have largely ignored a unified framework for measuring classification complexity, instead focusing narrowly on the computational cost of isolated algorithms. To bridge this gap, the authors formalize a new notion of automatic classification complexity and develop a generalized algorithm that constructs a complete family of reasonable solutions rather than forcing a single output. This framework allows domain experts to systematically evaluate and select optimal groupings using additional contextual criteria, a capability the authors demonstrate through an analysis of voting behavior in the Russian State Duma.

Dataset

  • Dataset composition and sources: The authors do not specify any dataset composition or external sources in the provided excerpt.
  • Subset details: The text omits key details for each subset, including sizes, origins, and filtering rules.
  • Data usage: The authors do not describe how they apply the data, such as training splits, mixture ratios, or evaluation protocols.
  • Processing and metadata: No cropping strategies, metadata construction steps, or preprocessing pipelines are outlined.

Method

The proposed method for solving the Automatic Classification (AC) problem operates through a three-level hierarchical framework, structured to progressively refine and validate classifications. At the core of this framework is a Divisive-Agglomerative Algorithm (DAA), which alternates between divisive and agglomerative stages to generate a family of classifications. The process begins with the construction of a neighborhood graph from the input dissimilarity matrix, where vertices represent objects and edges connect each object to its nearest neighbors based on proximity, forming a connected graph. This graph serves as the primary structure for subsequent operations.

At the internal level, the algorithm employs a frequency minimax dichotomy method to recursively partition the graph. This method initializes edge frequencies to zero and iteratively selects random vertex pairs. For each pair, a minimal path is constructed using Dijkstra's algorithm, where edge length is defined by current frequency. The frequencies of all edges in the path are incremented by one. This accumulation process is repeated for a fixed number of iterations, TTT, to stabilize edge frequencies. After this stage, the maximum frequency fmaxf_{\max}fmax is recorded. The algorithm then executes one final path construction and checks if the new maximum frequency fmodf_{\mathrm{mod}}fmod remains unchanged. If so, the process repeats until fmod<fmaxf_{\mathrm{mod}} < f_{\max}fmod<fmax. At this point, all edges with frequency fmaxf_{\max}fmax are removed, and the resulting connectivity components define the dichotomy of the graph. The largest component is designated as the first part, while all others form the second part. This procedure is designed to find a cut that maximizes a decomposition function, effectively separating graph components based on edge frequency.

The intermediate level, governed by the DAA, orchestrates the alternation of divisive and agglomerative stages. The process starts with the initial graph, which is dichotomized into two classes, forming the first essential classification D2D_2D2. The larger of the two resulting subgraphs is then subjected to another dichotomy, yielding a classification D3D_3D3 into three classes. This iterative divisive process continues for a maximum number of kkk dichotomies, producing a sequence of essential classifications DjD_jDj for j=2,3,,k+1j = 2, 3, \ldots, k+1j=2,3,,k+1. Following each divisive step, the agglomerative stage is initiated. For a given essential classification DjD_jDj, the algorithm constructs a series of adjoint classifications Cjj,Cj1j,,C2jC_j^j, C_{j-1}^j, \ldots, C_2^jCjj,Cj1j,,C2j by successively merging the most densely connected subsets, which is achieved by joining components connected by the maximum number of edges. This agglomerative process generates a family of classifications that are complementary to the divisive ones.

The external level of the algorithm addresses the stochastic nature of the process by executing multiple independent runs of the DAA. Due to the random selection of vertex pairs in the internal dichotomy stage, different runs can produce varying sets of classifications. The final output is determined by collecting all distinct classifications generated across all runs and selecting the unique ones. This step is crucial for identifying stable, consistent classifications that are not artifacts of random initialization. The framework's design ensures that even when individual runs yield different results, the repetition allows for the extraction of a reliable and robust solution set, which is then analyzed for stability and other criteria to determine the most meaningful classification.

Experiment

The study evaluates legislative behavior across the second, third, and fourth Russian State Dumas by modeling deputy voting records as vectors and applying a classification algorithm to track political coordination over time. Experiments validate the proposed method's parameter robustness and compare its classification stability against the K-means algorithm, demonstrating that the new approach consistently produces more reliable and coherent deputy groupings. Qualitatively, the analysis reveals that classification complexity serves as an effective indicator of systemic legislative incoordination, with measurable fluctuations aligning with major political realignments. Ultimately, the findings confirm that the metric captures overarching dynamics of collective decision-making rather than isolated voting outcomes, establishing its value as a stable analytical tool for political science research.

The authors analyze voting patterns in the Russian State Duma across multiple terms, using a method that measures the complexity of classifications derived from deputies' voting behavior. The complexity metric, calculated based on dissimilarity between deputies' voting vectors, varies over time and is influenced by the parameters of the algorithm used. The analysis reveals that classifications generated by the proposed method are more stable and consistent than those obtained using the K-means method, particularly in capturing large, coherent groups of deputies. The complexity of voting-based classifications varies over time and is influenced by algorithm parameters such as the number of dichotomies and runs. Classifications generated by the proposed method are more stable and consistently form larger, coherent groups compared to those from the K-means method. The complexity metric reflects the inconsistency and incoordination in political decision-making, not just the outcome of individual votes.

The authors analyze voting patterns in the Russian State Duma across multiple legislative periods using a complexity measure derived from classification methods. Results show that the complexity of voting classifications varies significantly between different Dumas, with Duma 2 exhibiting the highest complexity and Duma 3 the lowest, while Duma 4 shows intermediate values. The analysis suggests that complexity reflects the level of inconsistency or incoordination in political decision-making. Complexity of voting classifications is highest in Duma 2 and lowest in Duma 3, with Duma 4 showing intermediate values. The complexity measure captures the inconsistency or incoordination in political decision-making across different legislative periods. The method used produces more stable classifications compared to K-means, particularly in identifying large, cohesive groups of deputies.

The authors analyze voting patterns in the Russian State Duma across multiple legislative terms, using a mathematical model to assess the complexity of classifications derived from deputies' voting behavior. The analysis reveals variability in complexity over time, with some periods showing higher instability in voting alignments, and compares the stability of classifications generated by different clustering methods. The complexity metric is interpreted as a measure of political inconsistency or incoordination, reflecting the overall structure of decision-making rather than individual vote outcomes. Complexity of voting classifications varies across different periods, indicating shifts in political alignment and coordination. The proposed method produces more stable classifications compared to K-means, with larger and more consistent groups of deputies. Complexity is interpreted as a measure of political incoordination, influenced by the overall voting patterns rather than individual votes.

The authors analyze voting patterns in the Russian State Duma across multiple terms, using a method that measures the complexity of classifications derived from deputy voting behavior. The analysis reveals variability in complexity over time, with some periods showing lower complexity indicating more consistent voting behavior, while others exhibit higher complexity suggesting greater inconsistency or incoordination among deputies. The method is compared to K-means clustering, demonstrating greater stability in the classifications produced by the proposed approach. Complexity of voting classifications varies across different periods, with lower values indicating more consistent voting behavior among deputies. The proposed method produces more stable classifications compared to K-means clustering, particularly in cases where large, dominant groups are present. Complexity is interpreted as a measure of inconsistency or incoordination in political decision-making, rather than being determined by individual vote outcomes.

The authors analyze the complexity of classifications derived from voting patterns in the Russian State Duma across multiple legislative periods, using a method based on dissimilarity matrices and consecutive dichotomies. The complexity values vary over time and are influenced by the parameters of the algorithm, with some classifications showing greater stability than others, particularly when compared to K-means clustering. The complexity is interpreted as a measure of political inconsistency or incoordination, reflecting the overall structure of voting behavior rather than individual vote outcomes. Complexity of voting classifications varies across different legislative periods, with noticeable trends observed in smoothed data. The suggested method produces more stable classifications than K-means, especially in cases where large, distinct groups of deputies are present. Complexity is interpreted as a measure of political incoordination, influenced by the overall voting pattern rather than individual votes.

The study evaluates voting patterns across multiple Russian State Duma legislative terms by applying a complexity metric derived from classification algorithms that measure dissimilarities in deputy voting behavior. This experimental framework validates that classification complexity fluctuates across periods, functioning as a qualitative indicator of political coordination versus inconsistency in legislative decision-making. Overall, the findings demonstrate that the proposed approach consistently yields more stable and coherent deputy groupings than traditional K-means clustering, confirming its reliability for tracking structural shifts in political alignment.


AIでAIを構築

アイデアからローンチまで — 無料のAIコーディング支援、すぐに使える環境、最高のGPU価格でAI開発を加速。

AI コーディング補助
すぐに使える GPU
最適な料金体系

HyperAI Newsletters

最新情報を購読する
北京時間 毎週月曜日の午前9時 に、その週の最新情報をメールでお届けします
メール配信サービスは MailChimp によって提供されています