Speaker: Aylin Last year I presented research showing how to de-anonymize programmers based on their coding style. This is of immediate concern to open source software developers who would like to remain anonymous. On the other hand, being able to de-anonymize programmers can help in forensic investigations, or in resolving plagiarism claims or copyright disputes. I will report on our new research findings in the past year. We were able to increase the scale and accuracy of our methods dramatically and can now handle 1,600 programmers, reaching 94% de-anonymization accuracy. In ongoing research, we are tackling the much harder problem of de-anonymizing programmers from binaries of compiled code. This can help identify the author of a suspicious executable file and can potentially aid malware forensics. We demonstrate the efficacy of our techniques using a dataset collected from GitHub. It is possible to identify individuals by de-anonymizing different types of large datasets. Once individuals are de-anonymized, different types of personal details can be detected from data that belong to them. Furthermore, their identities across different platforms can be linked. This is possible through utilizing machine learning methods that represent human data with a numeric vector that consists of features. Then a classifier is used to learn the patterns of each individual, to classify a previously unseen feature vector. Tor users, social networks, underground cyber forums, the Netflix dataset have been de-anonymized in the past five years. Advances in machine learning and the improvements in computational power, such as cloud computing services, make these large scale de-anonymization tasks possible in a feasible amount of time. As data aggregators are collecting vast amounts of data from all possible digital media channels and as computing power is becoming cheaper, de-anonymization threatens privacy on a daily basis. Last year, we showed how we can de-anony