-

Project 19: Exploring evolution of Linux
1. Introduction Version control repositories like CVS, Subversion or Git can be a real gold mine for software developers. They contain every change to the source code including the date (the “when”), the responsible developer (the “who”), as well as a little message that describes the intention (the “what”) of a change. In this notebook,…
-

Project 18: The Hottest topics in Machine Learning
1. Loading the NIPS papers The NIPS conference (Neural Information Processing Systems) is one of the most prestigious yearly events in the machine learning community. At each NIPS conference, a large number of research papers are published. Over 50,000 PDF files were automatically downloaded and processed to obtain a dataset on various machine learning techniques.…
-

Project 8: Netflix Movie Data – Analysis
1. Loading your friend’s data into a dictionary Netflix! What started in 1997 as a DVD rental service has since exploded into the largest entertainment/media company by market capitalization, boasting over 200 million subscribers as of January 2021. Given the large number of movies and series available on the platform, it is a perfect opportunity to flex…
-

Project 6: Scala Programming History on GitHub – Analysis
1. Scala’s real-world project repository data With almost 30k commits and a history spanning over ten years, Scala is a mature programming language. It is a general-purpose programming language that has recently become another prominent language for data scientists. Scala is also an open source project. Open source projects have the advantage that their entire…