ResLNet: deep residual LSTM network with longer input for action recognition

doi:10.1007/s11704-021-0236-9

	ResLNet: deep residual LSTM network with longer input for action recognition
	Wang, Tian 1; Li, Jiakun 2; Wu, Huai-Ning 2; Li, Ce3 ; Snoussi, Hichem 4; Wu, Yang 5
	2022-12
发表期刊	FRONTIERS OF COMPUTER SCIENCE
ISSN	2095-2228
卷号	16 期号:6
摘要	Action recognition is an important research topic in video analysis that remains very challenging. Effective recognition relies on learning a good representation of both spatial information (for appearance) and temporal information (for motion). These two kinds of information are highly correlated but have quite different properties, leading to unsatisfying results of both connecting independent models (e.g., CNN-LSTM) and direct unbiased co-modeling (e.g., 3DCNN). Besides, a long-lasting tradition on this task with deep learning models is to just use 8 or 16 consecutive frames as input, making it hard to extract discriminative motion features. In this work, we propose a novel network structure called ResLNet (Deep Residual LSTM network), which can take longer inputs (e.g., of 64 frames) and have convolutions collaborate with LSTM more effectively under the residual structure to learn better spatial-temporal representations than ever without the cost of extra computations with the proposed embedded variable stride convolution. The superiority of this proposal and its ablation study are shown on the three most popular benchmark datasets: Kinetics, HMDB51, and UCF101. The proposed network could be adopted for various features, such as RGB and optical flow. Due to the limitation of the computation power of our experiment equipment and the real-time requirement, the proposed network is tested on the RGB only and shows great performance.
关键词	action recognition deep learning neural network
DOI	10.1007/s11704-021-0236-9
收录类别	SCIE ; EI
语种	英语
WOS研究方向	Computer Science
WOS类目	Computer Science, Information Systems ; Computer Science, Software Engineering ; Computer Science, Theory & Methods
WOS记录号	WOS:000745605300006
出版者	HIGHER EDUCATION PRESS
EI入藏号	20220511550497
EI主题词	Convolution
EI分类号	461.4 Ergonomics and Human Factors Engineering ; 716.1 Information Theory and Signal Processing
来源库	WOS
引用统计	被引频次：6[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	https://ir.lut.edu.cn/handle/2XXMBERH/154758
专题	电气工程与信息工程学院
通讯作者	Wu, Yang
作者单位	1.Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China; 2.Beihang Univ, Sch Automat Sci & Elect Engn, Beijing 100191, Peoples R China; 3.Lanzhou Univ Technol, Coll Elect & Informat Engn, Lanzhou 730050, Peoples R China; 4.Univ Technol Troyes, Inst Charles Delaunay LM2S FRE CNRS 2019, F-10010 Troyes, France; 5.Nara Inst Sci & Technol, Inst Res Initiat, Nara 6300192, Japan
推荐引用方式 GB/T 7714	Wang, Tian,Li, Jiakun,Wu, Huai-Ning,et al. ResLNet: deep residual LSTM network with longer input for action recognition[J]. FRONTIERS OF COMPUTER SCIENCE,2022,16(6).
APA	Wang, Tian,Li, Jiakun,Wu, Huai-Ning,Li, Ce,Snoussi, Hichem,&Wu, Yang.(2022).ResLNet: deep residual LSTM network with longer input for action recognition.FRONTIERS OF COMPUTER SCIENCE,16(6).
MLA	Wang, Tian,et al."ResLNet: deep residual LSTM network with longer input for action recognition".FRONTIERS OF COMPUTER SCIENCE 16.6(2022).