Recent research from the University of Cagliari and the University of Salerno highlights significant vulnerabilities in user passwords due to the availability of personal information on social media. The study, which employs a tool named SODA ADVANCE, investigates how publicly accessible data can be reconstructed to assess the strength of passwords, revealing alarming implications for user security.
SODA ADVANCE is designed to compile user profiles from public sources such as Facebook, Instagram, and LinkedIn. By using facial recognition technology, it merges data into a cohesive profile to evaluate the security of user passwords. This evaluation is quantified through a metric known as Cumulative Password Strength, which ranges from 0 to 1. A higher score indicates stronger passwords that are less susceptible to guessing based on the individual’s online presence.
To conduct the research, the team gathered information from 100 volunteers, asking for their name, surname, and a photograph. With this minimal data, SODA ADVANCE successfully located matching profiles across various social media platforms. The tool then analyzed passwords provided by the volunteers against the reconstructed profiles, shedding light on how easily these passwords could be guessed by malicious actors.
The study also explored the performance of several large language models (LLMs), including Claude, ChatGPT, Google Gemini, Dolly, LLaMa, and Falcon. The LLMs were tasked with generating strong yet memorable passwords based on user details without directly reusing that information. In this initial phase, Claude achieved the highest average score of 0.82, demonstrating superior ability to generate varied and secure passwords. In contrast, Dolly, LLaMa, and Falcon produced less effective passwords with average scores around 0.66, primarily due to repetitive and easily guessable patterns.
Following this, the researchers assessed whether these LLMs could accurately evaluate password strength when provided with reconstructed user information. Each model received a combination of strong and weak passwords alongside detailed user profiles. Claude again excelled, achieving perfect accuracy, precision, recall, and F1 scores at 0.75. The research revealed that models performed significantly better when they had access to comprehensive personal data, with Falcon’s precision improving dramatically from 0.48 to 0.77.
The results indicate that LLMs are more adept at identifying risky passwords when they can reference meaningful personal context. Passwords that include elements such as birthdays, locations, or hobbies were flagged as weak more consistently when paired with relevant user data.
To benchmark SODA ADVANCE against conventional password strength tools, the researchers analyzed 250 passwords sourced from leaked datasets. Most tools categorized these passwords as medium strength. In contrast, SODA ADVANCE frequently identified passwords containing personal information as weak, despite other tools labeling them as strong due to their complexity. This discrepancy underscores the importance of evaluating the relationship between a password and its connection to the user’s digital footprint, rather than relying solely on syntactic complexity.
The final experiment involved testing PassBERT, a targeted password guessing model, on 25,000 passwords generated for the participants. PassBERT successfully inferred only 22 passwords, illustrating the effectiveness of combining semantic personalization with syntactic complexity. Despite being inspired by user traits, the generated passwords maintained structures that deviated from typical guessing patterns.
This research sheds light on the potential dangers posed by social media in password security, emphasizing the need for users to adopt more robust password practices in light of their publicly available information. With evolving technology and increasing access to personal data, the study serves as a crucial reminder of the vulnerabilities inherent in digital security.