04/2015 - 09/2018
The University of Tokyo: Master Student
Theme: Action Recognition, Egocentric Vision
05/2016 - 08/2016
Johns Hopkins University: Visiting Student
Working with Prof. Austin Reiter in the field of 3D Object Recognition
Summer 2015
NTT CS Lab: Internship
Worked with Dr. Takuya Yoshioka in the field of Automatic Speech Recognition
04/2011 - 03/2015
The University of Tokyo: Bachelor's Degree
Thesis: Robust Ego-Activities Detection of Daily Living in Diversity Environment with a Wrist-mounted Camera
I am a 3rd year Master Student in the Graduate School of Information Science and Technology at The University of Tokyo. I am with the Machine Intelligence Lab under the supervision of Prof. Tatsuya Harada. My research interests are in the fields of computer vision and machine learning, with a focus on action recognition and egocentric vision.

Main Research Interests: Video understanding and its application
(e.g. action recognition, event detection, egocentric vision, video captioning, video generation)

More details can be found in my CV (Apr. 2017).


Recognizing Activities of Daily Living with a
Wrist-mounted Camera
This study proposes to mount a wrist-mounted camera for the recognizing activities of daily living (ADL). Our contributions are the following:
1. Demonstrated the benefits of a wrist-mounted camera over a head-mounted camera for ADL recognition
2. Proposed a novel video representation
3. Developed a publicly available dataset
Katsunori Ohnishi, Atsushi Kanehira, Asako Kanezaki, Tatsuya Harada
CVPR 2016 (Spotlight presentation)
Improved Dense Trajectories with Cross-Stream
We present a new local descriptor that pools a new convolutional layer obtained from crossing two-stream networks along iDT, which is calculated by giving discriminative weights from one network on a convolutional layer of the other network. Our method has achieved state-of-the-art performance on ordinal action recognition datasets, 92.3% on UCF101, and 66.2% on HMDB51.
Katsunori Ohnishi, Masatoshi Hidaka, Tatsuya Harada
ACMMM 2016
Beyond Caption to Narrative: Video Captioning with Multiple Sentences
We attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from a single video, and connecting them with natural language processing techniques, in order to generate a story-like caption. We show that our proposed method can generate captions that are richer in contents.
Andrew Shin, Katsunori Ohnishi, Tatsuya Harada
ICIP 2016
Noise Robust Speech Recognition using Recent Developments in Neural Networks for Computer Vision
This paper considers deeper convolutional neural networks and better activation function for speech recognition. We have achieved a WER of 11.1%, which is significantly better than the baseline CNN performance of 13.2% and previously reported results in the Aurora4 task.
Takuya Yoshioka, Katsunori Ohnishi, Fuming Fang, Tomohiro Nakatani


Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning
We propose to incorporate coding with VLAD on spatial pyramid for CNN features of sub-regions in order to generate image representations that better reflect the local information of the images. Our results show that our method of compact VLAD coding can match CNN features with as little as 3% of dimensionality and, when combined with spatial pyramid, it results in image captions that more accurately take local elements into account.
Andrew Shin, Masataka Yamaguchi, Katsunori Ohnishi, Tatsuya Harada
arXiv 2016


We have archieved the 3rd place in the task 1b: Object detection with additional training data.
Masataka Yamaguchi, Qishen Ha, Katsunori Ohnishi, Masatoshi Hidaka, Yusuke Mukuta, Tatsuya Harada
Large Scale Visual Recognition Challenge 2015 in conjunction with ICCV 2015 (Invited poster)