Classifying Canadian Immigration Topics on Reddit with DistilBERT

Authors: Fanmei Wang & Hongan Lai

Small team project (postgraduate). I built the end‑to‑end pipeline on 10k+ Reddit submissions, designed 8 topic labels, curated ~1.1k human‑verified samples, and fine‑tuned DistilBERT (test accuracy ≈ 78.6%). Hongan supported text cleaning, contributed to manual verification, prepared several baseline models, implemented a small Flask demo, and recorded the video. The classifier is used to examine topic shifts on Reddit before and after the May 31, 2023 policy.