Big Data Management and Application Laboratory (BIGBASE)

학부생 연구원, 석/박사과정, 박사후 과정 모집 중 입니다.

We are looking for post-doc researchers, MS/Ph.D. students, and the undergraduate students who are passionate about data engineering, data analytics, and data management for large-scale data sets and various types of data.

If you are interested in joining our Lab, please send an email (hyukyoon.kwon [at] seoultech.ac.kr) including 1) CV, 2) cover letter, and 3) transcript. Then, we may have an interview in an online or offline manner.

Research Area

We focus on the research for the management and application of big data based on data engineering techniques.
The major distinguishing feature of our lab is that we deal with the overall process of managing and analyzing big data from data collection to data analysis and visualization.
We mainly aim to let the students learn the practical data engineering and analysis techniques that can be actually utilized in the industry.

Details of Research Area

We can classify the overall research area into 1) big data collection, 2) big data management, and 3) big data analytics. During the process, we emphasize the data engineering techniques to make the overall cycle efficient and intelligent.

Big Data Collection

We collect various types of data from multiple sources. Especially, we aim to collect large-scale data with a fast speed from various environments including the Web, mobile devices, IoT devices, and smart factories.

Target Data Types: Tweets, Log, Location, Text, Graph, Image, Video
Covered Techniques: Web crawling (Scrapy, Selenium, BeautifulSoup), Distributed and parallel crawling (DeepScraper), Web page analysis, Steaming data processing

Big Data Storage and Management

We store large-scale data in databases and distributed storages and manage them to be effectively connected to the analysis. It includes the selection of the most proper database type for a given data and queuing system to control the speed difference between the consumers and producers.

Covered Techniques: Relational databases (Oracle, MariaDB), Distributed File System (HDFS), Hadoop ecosystems (Hive, Impala, Sqoop), Key-Value Stores (Redis, RocksDB, LevelDB), Document Store (MongoDB), Search engines (ElasticSearch), Time-series databases (InfluxDB), Queuing systems (Kafka), Graph databases (Neo4J)

Big Data Analytics and Applications

We extract useful information from the large-scale data and from the multiple-type integrated data. We apply distributed or federated computing techniques to deal with large-scale data sets and deep learning and machine learning models to find new information intelligently. We finally visualize the results for the end-users.

Covered techniques: Parallel and distributed computing (Apache Spark), Machine Learning, Deep Learning (PyTorch, Tensorflow), Data Visualization (ELK, Grafana)