Big Data-Driven AI Laboratory (BIGBASE)
학부생 연구원 (석사과정 연계), 석/박사과정, 박사후 과정 모집 중 입니다.
We are looking for post-doc researchers, MS/Ph.D. students, and undergraduate students who are passionate about 1) Data Engineering (ML Ops and Data Ops), 2) AI-focused Data Analysis, and 3) Large-Scale Data Management (Cloud or Distributed). We are targeting global top-level research—top conferences (SIGMOD, NeurIPS, AAAI, ICDE, KDD, ICDM, WEB) and top journals (TKDE, TII, TCC) in the field of AI and Big Data.
If you are interested in joining our Lab, please send an email (hyukyoon.kwon [at] seoultech.ac.kr) including 1) CV, 2) cover letter, and 3) transcript. Then, we may have an interview online or offline.
Recent News
[Top Conference] "FedSDP: Federated Self-Derived Prototypes for Personalized Federated Learning" was accepted to IEEE ICDE 2025 and will be presented in May 2025 in Hong Kong. Congrats to Jihoon Moon. This was worked on with Prof. Ling Liu at Georgia Tech.
[New Project] 한국연구재단, 우수연구-중견연구 지원사업 선정 (5년, 9억). "웹 데이터의 편향성 분석과 공정한 학습모델 구축" 2025.3~2030.2 (연구책임자).
[Top Conference] "TAIL-MIL: Time-Aware and Instance-Learnable Multiple Instance Learning for Multivariate Time Series Anomaly Detection" was accepted to AAAI 2025 and will be presented in Feb. 2025 in Philadelphia, USA. Congrats to Jaeseok Jang.
[Overseas Training] Jaeseok Jang will join Georgia Tech as a visiting student from Dec. 2024 to Feb. 2025.
[Award] 장재석 석사과정, 제20회 대한산업공학회 석사논문경진대회 우수상 수상, 2024년 11월.
[Top Conference] "Multi-Level Graph Representation Learning Through Predictive Community-based Partitioning" was accepted to ACM SIGMOD 2025 and will be presented in June 2025 in Berlin, Germany. Congrats to Bo-Young Lim and Jeong-Ha Park.
[Top Conference] "Are Multiple Instance Learning Algorithms Learnable for Instances? " was accepted to NeurIPS 2024 and will be presented in Dec. 2024 in Vancouver, Canada. Congrats to Jaeseok Jang.
[Overseas Training] Min-Seon Kim joined Georgia Tech as a visiting student from June 2024 to August 2024.
[New Project] 산업통상자원부 국제공동기술개발사업(영국) 선정. "에너지 자산관리 시스템 가동성 향상을 위한 고장 예지와 안전 위험을 예측 탐지하는 NPU 기반 AIoT 엣지 시스템", 2024.11~2026.10 (한국측 대학 연구책임자).
[Top Journal] "SaaN 2L-GRL: Two-Level Graph Representation Learning Empowered with Subgraph-as-a-Node" was accepted by IEEE TKDE in June 2024 (Impact Factor: 8.9, top 1.8%). Congrats to Jeong-Ha Park and Bo-Young Lim.
[Top Journal] "Learning with Correlation-Guided Attention for Multi-Energy Consumption Forecasting" was accepted by IEEE TII in June 2024 (Impact Factor: 11.7, top 1.5%). Congrats to Jong Seong Park, Jeong-Ha Park, and Jihyeok Choi.
[Top Journal] "Self-Training of Cyber-Threat Classification Model with Threat-Payload Centric Augmentation" was accepted by IEEE TII in May 2024 (Impact Factor: 11.7, top 1.5%). Congrats to Jae-yeol Kim.
[New Project] 중소벤쳐기업부 과제 시장대응형 과제 선정. "상관관계분석 기반 XAI 및 AIoT Edge Detector 적용 실시간 지능형 건물자동제어설비 플랫폼 개발," 2024.5~2026.4 (공동연구기관 연구책임자).
Research Area
Data-Driven AI
AI-focused Practical Analysis
Continual learning (IEEE BigData2024a, IEEE BigData2024b)
Self-training (IEEE TII2024)
Federated learning (IEEE ICDE2025)
Multimodal learning
Large language model
Scalable Data Computing
Data scraping
Edge-Cloud computing
Distributed data ingestion
Data pipeline
Web search analysis
Details of Research Area
We can classify the overall research area into 1) big data collection, 2) big data management, and 3) big data analytics. During the process, we emphasize the data engineering techniques to make the overall cycle efficient and intelligent.
Big Data Collection
We collect various types of data from multiple sources. Especially, we aim to collect large-scale data with a fast speed from various environments including the Web, mobile devices, IoT devices, and smart factories.
Target Data Types: Tweets, Log, Location, Text, Graph, Image, Video
Covered Techniques: Web crawling (Scrapy, Selenium, BeautifulSoup), Distributed and parallel crawling (DeepScraper), Web page analysis, Steaming data processing
Big Data Storage and Management
We store large-scale data in databases and distributed storages and manage them to be effectively connected to the analysis. It includes the selection of the most proper database type for a given data and queuing system to control the speed difference between the consumers and producers.
Covered Techniques: Relational databases (Oracle, MariaDB), Distributed File System (HDFS), Hadoop ecosystems (Hive, Impala, Sqoop), Key-Value Stores (Redis, RocksDB, LevelDB), Document Store (MongoDB), Search engines (ElasticSearch), Time-series databases (InfluxDB), Queuing systems (Kafka), Graph databases (Neo4J)
Big Data Analytics and Applications
We extract useful information from the large-scale data and from the multiple-type integrated data. We apply distributed or federated computing techniques to deal with large-scale data sets and deep learning and machine learning models to find new information intelligently. We finally visualize the results for the end-users.
Covered techniques: Parallel and distributed computing (Apache Spark), Machine Learning, Deep Learning (PyTorch, Tensorflow), Data Visualization (ELK, Grafana)