Project 4: Search for World’s Oldest Businesses

    1. The oldest businesses in the world

    This is Staffelter Hof Winery, Germany’s oldest business, which was established in 862 under the Carolingian dynasty. It has continued to serve customers through dramatic changes in Europe such as the Holy Roman Empire, the Ottoman Empire, and both world wars. What characteristics enable a business to stand the test of time? Image credit: Martin Kraft The entrance to Staffelter Hof Winery, a German winery established in 862.

    To help answer this question, BusinessFinancing.co.uk researched the oldest company that is still in business in almost every country and compiled the results into a dataset. Let’s explore this work to to better understand these historic businesses. Our datasets, which are all located in the datasets directory, contain the following information:

    Businesss and new businessess

    columntypemeaning
    businessvarcharName of the business.
    year_foundedintYear the business was founded.
    category_codevarcharCode for the category of the business.
    country_codecharISO 3166-1 3-letter country code.

    Countries

    columntypemeaning
    country_codevarcharISO 3166-1 3-letter country code.
    countryvarcharName of the country.
    continentvarcharName of the continent that the country exists in.

    Categories

    columntypemeaning
    category_codevarcharCode for the category of the business.
    categoryvarcharDescription of the business category.

    Now let’s learn about some of the world’s oldest businesses still in operation!

    2. The oldest businesses in North America

    So far we’ve learned that Kongō Gumi is the world’s oldest continuously operating business, beating out the second oldest business by well over 100 years! It’s a little hard to read the country codes, though. Wouldn’t it be nice if we had a list of country names to go along with the country codes?

    Enter countries.csv, which is also located in the datasets folder. Having useful information in different files is a common problem: for data storage, it’s better to keep different types of data separate, but for analysis, we want all the data in one place. To solve this, we’ll have to join the two tables together.

    countries

    columntypemeaning
    country_codevarcharISO 3166-1 3-letter country code.
    countryvarcharName of the country.
    continentvarcharName of the continent that the country exists in.

    Since countries.csv contains a continent column, merging the datasets will also allow us to look at the oldest business on each continent!

    In [2]:

    # Load countries.csv to a DataFrame
    countries =pd.read_csv('datasets/countries.csv')
    
    # Merge sorted_businesses with countries
    businesses_countries = sorted_businesses.merge(countries, on ='country_code')
    businesses_countries.head()
    # Filter businesses_countries to include countries in North America only
    north_america = businesses_countries[businesses_countries.continent.str.contains('North')]
    north_america.head()
    

    Out[2]:

    businessyear_foundedcategory_codecountry_codecountrycontinent
    22La Casa de Moneda de México1534CAT12MEXMexicoNorth America
    28Shirley Plantation1638CAT1USAUnited StatesNorth America
    33Hudson’s Bay Company1670CAT17CANCanadaNorth America
    35Mount Gay Rum1703CAT9BRBBarbadosNorth America
    40Rose Hall1770CAT19JAMJamaicaNorth America

    3. The oldest business on each continent

    Now we can see that the oldest company in North America is La Casa de Moneda de México, founded in 1534. Why stop there, though, when we could easily find out the oldest business on every continent?

    In [3]:

    # Create continent, which lists only the continent and oldest year_founded
    continent =pd.DataFrame(businesses_countries.groupby("continent")['year_founded'].min())
    continent
    
    # Merge continent with businesses_countries
    merged_continent = continent.merge(businesses_countries)
    merged_continent
    # Subset continent so that only the four columns of interest are included
    subset_merged_continent = merged_continent[["continent","country","business","year_founded"]]
    subset_merged_continent
    

    Out[3]:

    continentcountrybusinessyear_founded
    0AfricaMauritiusMauritius Post1772
    1AsiaJapanKongō Gumi578
    2EuropeAustriaSt. Peter Stifts Kulinarium803
    3North AmericaMexicoLa Casa de Moneda de México1534
    4OceaniaAustraliaAustralia Post1809
    5South AmericaPeruCasa Nacional de Moneda1565

    4. Unknown oldest businesses

    BusinessFinancing.co.uk wasn’t able to determine the oldest business for some countries, and those countries are simply left off of businesses.csv and, by extension, businesses. However, the countries that we created does include all countries in the world, regardless of whether the oldest business is known.

    We can compare the two datasets in one DataFrame to find out which countries don’t have a known oldest business!

    In [4]:

    # Use .merge() to create a DataFrame, all_countries
    all_countries = businesses.merge(countries, how= 'right')
    
    # Filter to include only countries without oldest businesses
    missing_countries = all_countries[all_countries.business.isnull()]
    missing_countries
    
    # Create a series of the country names with missing oldest business data
    missing_countries_series = missing_countries.country
    
    # Display the series
    missing_countries_series
    

    Out[4]:

    1                                Angola
    7                   Antigua and Barbuda
    18                              Bahamas
    48                   Dominican Republic
    50                              Ecuador
    57                                 Fiji
    59      Micronesia, Federated States of
    63                                Ghana
    65                               Gambia
    69                              Grenada
    79            Iran, Islamic Republic of
    89                           Kyrgyzstan
    91                             Kiribati
    92                Saint Kitts and Nevis
    107                              Monaco
    108                Moldova, Republic of
    110                            Maldives
    112                    Marshall Islands
    131                               Nauru
    138                               Palau
    139                    Papua New Guinea
    143                            Paraguay
    144                 Palestine, State of
    153                     Solomon Islands
    160                            Suriname
    170                          Tajikistan
    171                        Turkmenistan
    172                         Timor-Leste
    173                               Tonga
    177                              Tuvalu
    185    Saint Vincent and the Grenadines
    189                               Samoa
    Name: country, dtype: object

    5. Adding new oldest business data

    It looks like we’ve got some holes in our dataset! Fortunately, we’ve taken it upon ourselves to improve upon BusinessFinancing.co.uk’s work and find oldest businesses in a few of the missing countries. We’ve stored the newfound oldest businesses in new_businesses, located at "datasets/new_businesses.csv". It has the exact same structure as our businesses dataset.

    new_businesses

    columntypemeaning
    businessvarcharName of the business.
    year_foundedintYear the business was founded.
    category_codevarcharCode for the category of the business.
    country_codecharISO 3166-1 3-letter country code.

    All we have to do is combine the two so that we’ve got one more complete list of businesses!

    In [5]:

    # Import new_businesses.csv
    new_businesses = pd.read_csv('datasets/new_businesses.csv')
    new_businesses.shape
    # Add the data in new_businesses to the existing businesses
    all_businesses = pd.concat([new_businesses,businesses])
    
    # Merge and filter to find countries with missing business data
    new_all_countries = all_businesses.merge(countries, how='right')
    new_missing_countries = new_all_countries[new_all_countries.business.isnull()]
    
    # Group by continent and create a "count_missing" column
    count_missing = pd.DataFrame(new_missing_countries.groupby('continent')['country'].count())
    count_missing.columns = ["count_missing"]
    count_missing
    

    Out[5]:

    ContinentsCount Missing
    Africa3
    Asia7
    Europe2
    North America5
    Oceania10
    South America3

    6. The oldest industries

    Remember our oldest business in the world, Kongō Gumi?

    businessyear_foundedcategory_codecountry_code
    64Kongō Gumi578CAT6JPN

    We know Kongō Gumi was founded in the year 578 in Japan, but it’s a little hard to decipher which industry it’s in. Information about what the category_code column refers to is in "datasets/categories.csv":

    categories

    columntypemeaning
    category_codevarcharCode for the category of the business.
    categoryvarcharDescription of the business category.

    Let’s use categories.csv to understand how many oldest businesses are in each category of industry.

    In [6]:

    # Import categories.csv and merge to businesses
    categories = pd.read_csv("datasets/categories.csv")
    businesses_categories = businesses.merge(categories)
    
    # Create a DataFrame which lists the number of oldest businesses in each category
    count_business_cats = pd.DataFrame(businesses_categories.groupby("category").agg({"year_founded":"count"}))
    
    # Create a DataFrame which lists the cumulative years businesses from each category have been operating
    years_business_cats = pd.DataFrame(businesses_categories.groupby("category").agg({"year_founded":"sum"}))
    
    # Rename columns and display the first five rows of both DataFrames
    count_business_cats.columns = ["count"]
    years_business_cats.columns = ["total_years_in_business"]
    
    display( count_business_cats.head(),years_business_cats.head())
    
    categorycount
    Agriculture6
    Aviation & Transport19
    Banking & Finance37
    Cafés, Restaurants & Bars6
    Conglomerate3
    categorytotal_years_in_business
    Agriculture10669
    Aviation & Transport36598
    Banking & Finance70302
    Cafés, Restaurants & Bars8532
    Conglomerate5671

    7. Restaurant representation

    No matter how we measure it, looks like Banking and Finance is an excellent industry to be in if longevity is our goal! Let’s zoom in on another industry: cafés, restaurants, and bars. Which restaurants in our dataset have been around since before the year 1800?

    In [7]:

    # Filter using .query() for CAT4 businesses founded before 1800; sort results
    old_restaurants = businesses_categories.query("(category_code == 'CAT4') & (year_founded <1800)")
    
    # Sort the DataFrame
    old_restaurants = old_restaurants.sort_values("year_founded")
    old_restaurants
    

    Out[7]:

    businessyear_foundedcategory_codecountry_codecategory
    142St. Peter Stifts Kulinarium803CAT4AUTCafés, Restaurants & Bars
    143Sean’s Bar900CAT4IRLCafés, Restaurants & Bars
    139Ma Yu Ching’s Bucket Chicken House1153CAT4CHNCafés, Restaurants & Bars

    8. Categories and continents

    St. Peter Stifts Kulinarium is old enough that the restaurant is believed to have served Mozart – and it would have been over 900 years old even when he was a patron! Let’s finish by looking at the oldest business in each category of commerce for each continent.

    In [8]:

    # Merge all businesses, countries, and categories together
    businesses_categories_countries = pd.merge(pd.merge(businesses,categories),countries)
    
    
    # Sort businesses_categories_countries from oldest to most recent
    businesses_categories_countries = businesses_categories_countries.sort_values("year_founded")
    
    # Create the oldest by continent and category DataFrame
    oldest_by_continent_category = pd.DataFrame(businesses_categories_countries.groupby(['continent','category'])['year_founded'].min())
    oldest_by_continent_category.head()
    

    Leave a comment

    Hello !!! I’m Ram Rallabandi

    an AI consultant and data scientist with a proven track record of delivering high-impact AI solutions. My mission is simple: transform complex AI strategies into measurable business outcomes.

    → What I Do:
    I specialize in leading the end-to-end lifecycle of AI products, from ideation to technical development to successful market launch. My expertise lies in bridging the gap between technical innovation and business value to ensure that every AI initiative aligns with organizational goals.

    → How I Add Value:
    ☑ Drive the adoption of responsible AI practices to future-proof businesses
    ☑ Build scalable, high-quality AI solutions that directly address customer needs
    ☑ Streamline processes to connect AI demand with delivery teams efficiently
    ☑ Analyze market trends to position data-driven solutions for maximum impact

    Why It Works:
    With a strong foundation in AI, machine learning, and data science, I provide strategic insights that empower organizations to leverage cutting-edge technologies for growth. I collaborate with technical and business teams to:
    ↳ Define product roadmaps
    ↳ Manage budgets, timelines, and scopes
    ↳ Foster cross-functional alignment for AI-driven innovation

    My Passion:
    I thrive on leading AI storytelling for executive leadership and fostering collaboration across diverse teams. With deep experience in Fintech and Payments, I’m dedicated to unlocking AI’s potential to deliver business transformation and create widely adopted solutions.

    Let’s connect to explore how AI can transform your business.