• Project 19: Exploring evolution of Linux

    1. Introduction Version control repositories like CVS, Subversion or Git can be a real gold mine for software developers. They contain every change to the source code including the date (the “when”), the responsible developer (the “who”), as well as a little message that describes the intention (the “what”) of a change. In this notebook,…


  • Project 18: The Hottest topics in Machine Learning

    1. Loading the NIPS papers The NIPS conference (Neural Information Processing Systems) is one of the most prestigious yearly events in the machine learning community. At each NIPS conference, a large number of research papers are published. Over 50,000 PDF files were automatically downloaded and processed to obtain a dataset on various machine learning techniques.…


  • Project 17: Generating Keywords for Google Ads: Low cost furniture store

    1. The brief Imagine working for a digital marketing agency, and the agency is approached by a massive online retailer of furniture. They want to test our skills at creating large campaigns for all of their website. We are tasked with creating a prototype set of keywords for search campaigns for their sofas section. The…


  • Project 8: Netflix Movie Data – Analysis

    1. Loading your friend’s data into a dictionary Netflix! What started in 1997 as a DVD rental service has since exploded into the largest entertainment/media company by market capitalization, boasting over 200 million subscribers as of January 2021. Given the large number of movies and series available on the platform, it is a perfect opportunity to flex…


  • Project 7: The NYC Airbnb Market – Analysis

    1. Importing the Data Welcome to New York City (NYC), one of the most-visited cities in the world. As a result, there are many Airbnb listings to meet the high demand for temporary lodging for anywhere between a few nights to many months. In this notebook, we will take a look at the NYC Airbnb market by…


  • Project 6: Scala Programming History on GitHub – Analysis

    1. Scala’s real-world project repository data With almost 30k commits and a history spanning over ten years, Scala is a mature programming language. It is a general-purpose programming language that has recently become another prominent language for data scientists. Scala is also an open source project. Open source projects have the advantage that their entire…


  • Project 5: Exploring Fifa World Cup Data

    This dataset (source) includes 44,066 results of international football matches starting from the very first official match in 1872 up to 2022. The matches range from FIFA World Cup to FIFI Wild Cup to regular friendly matches. The matches are strictly men’s full internationals and the data does not include Olympic Games or matches where at least…


  • Project 4: Search for World’s Oldest Businesses

    1. The oldest businesses in the world This is Staffelter Hof Winery, Germany’s oldest business, which was established in 862 under the Carolingian dynasty. It has continued to serve customers through dramatic changes in Europe such as the Holy Roman Empire, the Ottoman Empire, and both world wars. What characteristics enable a business to stand…


  • Project 3: Reducing Traffic Mortality in the USA

    1. The raw data files and their format While the rate of fatal road accidents has been decreasing steadily since the 80s, the past ten years have seen a stagnation in this reduction. Coupled with the increase in number of miles driven in the nation, the total number of traffic related-fatalities has now reached a…