04/2015 - 09/2018
The University of Tokyo: Master Student
Theme: Video recognition and generation
05/2016 - 08/2016
Johns Hopkins University: Visiting Student
Working with Prof. Austin Reiter in the field of 3D Object Recognition
Summer 2015
NTT CS Lab: Internship
Worked with Dr. Takuya Yoshioka in the field of Automatic Speech Recognition
04/2011 - 03/2015
The University of Tokyo: Bachelor's Degree
Thesis: Robust Ego-Activities Detection of Daily Living in Diversity Environment with a Wrist-mounted Camera
Main Research Interests: Video understanding and its application
(e.g. action recognition, event detection, egocentric vision, video captioning, video generation)

More details can be found in my CV (Jan. 2018).


Hierarchical Video Generation from Orthogonal Information: Optical flow and Texture

Katsunori Ohnishi*, Shohei Yamamoto*, Yoshitaka Ushiku, Tatsuya Harada
AAAI 2018 (Oral presentation)
* indicates equal contribution
Recognizing Activities of Daily Living with a
Wrist-mounted Camera
This study proposes to mount a wrist-mounted camera for the recognizing activities of daily living (ADL). Our contributions are the following:
1. Demonstrated the benefits of a wrist-mounted camera over a head-mounted camera for ADL recognition
2. Proposed a novel video representation
3. Developed a publicly available dataset
Katsunori Ohnishi, Atsushi Kanehira, Asako Kanezaki, Tatsuya Harada
CVPR 2016 (Spotlight presentation)
Improved Dense Trajectories with Cross-Stream
We present a new local descriptor that pools a new convolutional layer obtained from crossing two-stream networks along iDT, which is calculated by giving discriminative weights from one network on a convolutional layer of the other network. Our method has achieved state-of-the-art performance on ordinal action recognition datasets, 92.3% on UCF101, and 66.2% on HMDB51.
Katsunori Ohnishi, Masatoshi Hidaka, Tatsuya Harada
ACMMM 2016
Beyond Caption to Narrative: Video Captioning with Multiple Sentences
We attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from a single video, and connecting them with natural language processing techniques, in order to generate a story-like caption. We show that our proposed method can generate captions that are richer in contents.
Andrew Shin, Katsunori Ohnishi, Tatsuya Harada
ICIP 2016
Noise Robust Speech Recognition using Recent Developments in Neural Networks for Computer Vision
This paper considers deeper convolutional neural networks and better activation function for speech recognition. We have achieved a WER of 11.1%, which is significantly better than the baseline CNN performance of 13.2% and previously reported results in the Aurora4 task.
Takuya Yoshioka, Katsunori Ohnishi, Fuming Fang, Tomohiro Nakatani


We have archieved the 3rd place in the task 1b: Object detection with additional training data.
Masataka Yamaguchi, Qishen Ha, Katsunori Ohnishi, Masatoshi Hidaka, Yusuke Mukuta, Tatsuya Harada
Large Scale Visual Recognition Challenge 2015 in conjunction with ICCV 2015 (Invited poster)