Qinshuo Liu

Papers

From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics
with Weiqin Zhao, Guodong Li et al.
ICLR2025

The depth of neural networks is a critical factor for their capability, with deeper models often demonstrating superior performance. Motivated by this, significant efforts have been made to enhance layer aggregation - reusing information from previous layers to better extract features at the current layer, to improve the representational power of deep neural networks. However, previous works have primarily addressed this problem from a discrete-state perspective which is not suitable as the number of network layers grows. This paper novelly treats the outputs from layers as states of a continuous process and considers leveraging the state space model (SSM) to design the aggregation of layers in very deep neural networks. Moreover, inspired by its advancements in modeling long sequences, the Selective State Space Models (S6) is employed to design a new module called Selective State Space Model Layer Aggregation (S6LA). This module aims to combine traditional CNN or transformer architectures within a sequential framework, enhancing the representational capabilities of state-of-the-art vision networks. Extensive experiments show that S6LA delivers substantial improvements in both image classification and detection tasks, highlighting the potential of integrating SSMs with contemporary deep learning techniques.

DNA-SE: towards deep neural-nets assisted semiparametric estimation
with Zhonghua Liu et al.
ICML2024

Semiparametric statistics play a pivotal role in a wide range of domains, including but not limited to missing data, causal inference, and transfer learning, to name a few. In many settings, semiparametric theory leads to (nearly) statistically optimal procedures that yet involve numerically solving Fredholm integral equations of the second kind. Traditional numerical methods, such as polynomial or spline approximations, are difficult to scale to multi-dimensional problems. Alternatively, statisticians may choose to approximate the original integral equations by ones with closed-form solutions, resulting in computationally more efficient, but statistically suboptimal or even incorrect procedures. To bridge this gap, we propose a novel framework by formulating the semiparametric estimation problem as a bi-level optimization problem; and then we propose a scalable algorithm called **D**eep **N**eural-Nets **A**ssisted **S**emiparametric **E**stimation ($\mathsf{DNA\mbox{-}SE}$) by leveraging the universal approximation property of Deep Neural-Nets (DNN) to streamline semiparametric procedures. Through extensive numerical experiments and a real data analysis, we demonstrate the numerical and statistical advantages of $\mathsf{DNA\mbox{-}SE}$ over traditional methods. To the best of our knowledge, we are the first to bring DNN into semiparametric statistics as a numerical solver of integral equations in our proposed general framework.

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
with Wei Huang et al.

Large language models (LLMs) have achieved remarkable progress, but their extensive number of parameters results in high memory usage, significant loading latency, and substantial computational demands. To address these challenges, post-training quantization (PTQ) has emerged as an effective technique for compressing model weights. In the context of PTQ for LLMs, existing uniform quantization methods, though efficient in terms of memory and computational requirements, often struggle to maintain performance. In this paper, we propose SliM-LLM, a Salience-Driven Mixed-Precision Quantization scheme that achieves group-wise bit-width allocation with mixed precisions for efficient LLMs with high accuracy. Building on our observation that salient/important weights often follow a structured distribution, we incorporate two core components to preserve post-quantization performance in LLMs while maintaining efficiency: 1) Salience-Determined Bit Allocation adaptively assigns bit widths to groups within each layer based on their group-level salience, aiming to minimize the reconstruction error of activations; and 2) Salience-Weighted Quantizer Calibration optimizes quantizer parameters by incorporating element-level salience, ensuring that the most critical weights are preserved, further preserving important weights information. With its structured group partitioning, SliM-LLM offers a hardware-friendly quantization approach, maintaining computational and memory efficiency comparable to highly optimized uniform quantization methods. Extensive experiments demonstrate that SliM-LLM significantly improves the accuracy of various LLMs when quantized to ultra-low bit widths. For instance, a 2-bit quantized LLaMA-7B model achieves nearly 6x memory reduction compared to its floating-point counterpart, alongside a 48% reduction in perplexity compared to the leading gradient-free PTQ method, all while maintaining GPU inference speed. Furthermore, SliM-LLM+, which incorporates gradient-based quantizers, reduces perplexity by an additional 35.1%.

PGformer: Proxy-Bridged Game Transformer for Multi-Person Highly Interactive Extreme Motion Prediction
with Yanwen Fang et al.

Multi-person motion prediction is a challenging task, especially for real-world scenarios of highly interacted persons. Most previous works have been devoted to studying the case of weak interactions (e.g., walking together), in which typically forecasting each human pose in isolation can still achieve good performances. This paper focuses on collaborative motion prediction for multiple persons with extreme motions and attempts to explore the relationships between the highly interactive persons' pose trajectories. Specifically, a novel cross-query attention (XQA) module is proposed to bilaterally learn the cross-dependencies between the two pose sequences tailored for this situation. A proxy unit is additionally introduced to bridge the involved persons, which cooperates with our proposed XQA module and subtly controls the bidirectional spatial information flows. These designs are then integrated into a Transformer-based architecture and the resulting model is called Proxy-bridged Game Transformer (PGformer) for multi-person interactive motion prediction. Its effectiveness has been evaluated on the challenging ExPI dataset, which involves highly interactive actions. Our PGformer consistently outperforms the state-of-the-art methods in both short- and long-term predictions by a large margin. Besides, our approach can also be compatible with the weakly interacted CMU-Mocap and MuPoTS-3D datasets and extended to the case of more than 2 individuals with encouraging results.

DGCformer: Deep Graph Clustering Transformer for Multivariate Time Series Forecasting
with Yanwen Fang et al.

Multivariate time series forecasting tasks are usually conducted in a channel-dependent (CD) way since it can incorporate more variable-relevant information. However, it may also involve a lot of irrelevant variables, and this even leads to worse performance than the channel-independent (CI) strategy. This paper combines the strengths of both strategies and proposes the Deep Graph Clustering Transformer (DGCformer) for multivariate time series forecasting. Specifically, it first groups these relevant variables by a graph convolutional network integrated with an autoencoder, and a former-latter masked self-attention mechanism is then considered with the CD strategy being applied to each group of variables while the CI one for different groups. Extensive experimental results on eight datasets demonstrate the superiority of our method against state-of-the-art models, and our code will be publicly available upon acceptance.

Qinshuo Liu

Home

Research

CV

Teaching

Papers