04/2015 - 09/2018
The University of Tokyo: Master Student
Theme: Action Recognition, Egocentric Vision
05/2016 - 08/2016
Johns Hopkins University: Visiting Student
Working with Prof. Austin Reiter
in the field of 3D Object Recognition
NTT CS Lab: Internship
Worked with Dr. Takuya Yoshioka
in the field of Automatic Speech Recognition
04/2011 - 03/2015
The University of Tokyo: Bachelor's Degree
Thesis: Robust Ego-Activities Detection of Daily Living in Diversity Environment with a Wrist-mounted Camera
Recognizing Activities of Daily Living with a
This study proposes to mount a wrist-mounted camera for the recognizing activities of daily living (ADL).
Our contributions are the following:
1. Demonstrated the benefits of a wrist-mounted camera over a head-mounted camera for ADL recognition
2. Proposed a novel video representation
3. Developed a publicly available dataset
Katsunori Ohnishi, Atsushi Kanehira, Asako Kanezaki, Tatsuya Harada
CVPR 2016 (Spotlight presentation)
Improved Dense Trajectories with Cross-Stream
We present a new local descriptor that pools a new convolutional layer obtained from crossing two-stream networks along iDT, which is calculated by giving discriminative weights from one network on a convolutional layer of the other network. Our method has achieved state-of-the-art performance on ordinal action recognition datasets, 92.3% on UCF101, and 66.2% on HMDB51.
Katsunori Ohnishi, Masatoshi Hidaka, Tatsuya Harada
Beyond Caption to Narrative: Video Captioning with Multiple Sentences
We attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from a single video, and connecting them with natural language processing techniques, in order to generate a story-like caption. We show that our proposed method can generate captions that are richer in contents.
Andrew Shin, Katsunori Ohnishi, Tatsuya Harada
Noise Robust Speech Recognition using Recent Developments in Neural Networks for Computer Vision
This paper considers deeper convolutional neural networks and better activation function for speech recognition. We have achieved a WER of 11.1%, which is significantly better than the baseline CNN performance of 13.2% and previously reported results in the Aurora4 task.
Takuya Yoshioka, Katsunori Ohnishi, Fuming Fang, Tomohiro Nakatani
Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning
We propose to incorporate coding with VLAD on spatial pyramid for CNN features of sub-regions in order to generate image representations that better reflect the local information of the images. Our results show that our method of compact VLAD coding can match CNN features with as little as 3% of dimensionality and, when combined with spatial pyramid, it results in image captions that more accurately take local elements into account.
Andrew Shin, Masataka Yamaguchi, Katsunori Ohnishi, Tatsuya Harada
We have archieved the 3rd place in the task 1b: Object detection with additional training data.
Masataka Yamaguchi, Qishen Ha, Katsunori Ohnishi, Masatoshi Hidaka, Yusuke Mukuta, Tatsuya Harada
Large Scale Visual Recognition Challenge 2015 in conjunction with ICCV 2015 (Invited poster)