How to Create Your Own Simple Linear Regression Equation

Updated on February 27, 2020
The relationship between ice cream sales and the outdoor temperature can be represented with a simple regression equation.
The relationship between ice cream sales and the outdoor temperature can be represented with a simple regression equation. | Source

Regression equations are frequently used by scientists, engineers, and other professionals to predict a result given an input. Regression equations are developed from a set of data obtained through observation or experimentation. There are many types of regression equations, but the simplest one the linear regression equation. A linear regression equation is simply the equation of a line that is a “best fit” for a particular set of data. Even though you may not be a scientist, engineer, or mathematician, simple linear regression equations can find good uses in anyone’s daily life.

What is a Linear Regression Equation?

A linear regression equation takes the same form as the equation of a line and is often written in the following general form: y = A + Bx

Where ‘x’ is the independent variable (your known value) and ‘y’ is the dependent variable (the predicted value). The letters ‘A’ and ‘B’ represent constants that describe the y-axis intercept and the slope of the line.

A scatter plot and regression equation of age versus cat ownership.
A scatter plot and regression equation of age versus cat ownership. | Source

The image at right shows a set of data points and a “best fit” line that is the result of a regression analysis. As you can see, the line does not actually pass through all of the points. The distance between any point (observed or measured value) and the line (predicted value) is called the error. The smaller the errors are, the more accurate the equation is and the better it is at predicting unknown values. When the errors are reduced to their smallest level possible, the line of ‘best fit’ is created.

If you have a spreadsheet program such as Microsoft Excel, then creating a simple linear regression equation is a relatively easy task. After you have input your data into a table format, you can use the chart tool to make a scatter-plot of the points. Next, simply right-click on any data point and select “add trend line” to bring up the regression equation dialogue box. Select the linear trend line for the type. Go to the options tab and be sure to check the boxes to display the equation on the chart. Now you can use the equation to predict new values whenever you need to.

Not everything in the world is going to have a linear relationship between them. Many things are better described using exponential or logarithmic equations rather than linear equations. However, that doesn’t preclude any of us from trying to describe something simply. What really matters here is how accurately the linear regression equation describes the relationship of the two variables. If there is good correlation between the variables, and the relative error is small, then the equation is deemed accurate and can be used to make predictions about new situations.

What if I don’t have a Spreadsheet or Statistics Program?

Even if you don’t have a spreadsheet program like Microsoft Excel, you can still derive your own regression equation from a small dataset with relative ease (and a calculator). Here is how you do it:

1. Create a table using the data that you have recorded from either an observation or an experiment. Label the independent variable ‘x’ and the dependent variable ‘y’

2. Next, add 3 more columns to your table. The first column should be labeled ‘xy’and should reflect the product of the ‘x’ and ‘y’ values in your first two columns, The next column should be labeled ‘x2’and should reflect the square of the ‘x’ value. The final column should be labeled ‘y2’ and reflect the square of the ‘y’ value.

3. After you have added the three additional columns, you should add a new row to the bottom that totals the values of the numbers in the column above it. When you are done you should have a completed table that looks similar to the one below:

X (Age)
Y (Cats)

4. Next, use the following two equations to calculate what the constants ‘A’ and ‘B’ are in the linear equation. Note that from the above table ‘n’ is the sample size (number of data points) which in this case is 15.


In the above example relating age to cat ownership, if we use the equations shown above we get A = 0.29344962 and B = 0.0629059. Therefore our linear regression equation is Y = 0.293 + 0.0629x. This matches the equation that was generated from Microsoft Excel (see the scatter plot above).

As you can see, creating a simple linear regression equation is very easy, even when it is completed by hand.

How Accurate is my Regression Equation?

When talking about regression equations you may hear about something called the Coefficient of Determination (or R2 value). This is a number between 0 and 1 (basically a percentage) that tells you how well the equation actually describes the set of data. The closer the R2 value is to 1, the more accurate the equation is. Microsoft Excel can calculate the R2 value for you very easily. There is a way to calculate the R2 value by hand but it is quite tedious. Perhaps that will be another article that I will write in the future.

Examples of Other Potential Applications

In addition to the above example, there are several other things that regression equations can be used for. In fact, the list of possibilities is endless. All that is really needed is a desire to represent the relationship of any two variables with a linear equation. Below is a brief list of ideas that regression equations can be developed for.

  • Comparing the amount of money spent on Christmas gifts given the number of people you have to buy for.
  • Comparing the amount of food needed for dinner given the number of people that are going to eating
  • Describing the relationship between how much TV you watch and how many calories you consume
  • Describing how the amount times you do laundry relates to the length of time clothes remain wearable
  • Describing the relationship between the average daily temperature and the amount of people seen at the beach or a park
  • Describing how your electricity usage relates to the average daily temperature
  • Correlating the amount of birds observed in your backyard with the amount of birdseed that you left outside
  • Relating the size of a house with the amount of electricity that is needed to operate and maintain it
  • Relating the size of a house with the price for a given location
  • Relating the height versus the weight of everyone in your family

These are just a few of the endless things that regression equations can be used for. As you can see, there are many practical applications for these equations in our everyday life. Wouldn’t it be great to make reasonably accurate predictions about various things that we experience each and every day? I sure think so! Using this relatively simple mathematical procedure, I hope that you’ll find new ways to bring order to things that would otherwise be described as unpredictable.

Questions & Answers

  • Q1. The following table represent a set of data on two variables Y and X. (a) Determine the linear regression equation Y = a + bX. Use your line to estimate Y when X = 15. (b) Calculate the Pearson’s correlation coefficient between the two variables. (c) Calculate Spearman's correlation Y 5 15 12 6 30 6 10 X 10 5 8 20 2 24 8?

    Given the set of numbers Y = 5,15,12,6,30,6,10 and X = 10,5,8,20,2,24,8 the equation of a simple linear regression model becomes: Y = -0.77461X +20.52073.

    When X is equal to 15, the equation predicts a Y value of 8.90158.

    Next, to calculate the Pearson Correlation Coefficient, we use the equation r = (sum(x-xbar)(y-ybar))/(root(sum(x-xbar)^2 sum(y-ybar)^2)).

    Next, inserting values, the equation becomes r = (-299)/(root((386)(458))) = -299/420.4617,

    Therefore, Pearson's Correlation Coefficient is -0.71112

    Finally, to calculate Spearman's Correlation, we use the following equation: p = 1 - [((6(sum(d^2))/(n(n^2-1))]

    To use the equation we first rank the data, calculate the difference in rank as well as the squared difference in rank. The sample size, n, is 7 and the sum of the square of rank differences is 94

    Solving p = 1 - ((6)(94))/(7(7^2-1) = 1 - (564)/(336) = 1 - 1.678571 = -0.67857

    Therefore, Spearman's Correlation is -0.67857


    0 of 8192 characters used
    Post Comment
    • profile image

      abdullah imtiaz 

      2 months ago

      why calculate y2 when we don't even use it

    • profile image

      Bob longnecker 

      4 months ago

      If I have a right triangle with the following given:

      A = 3.6

      Three angles are 90, 1 and 89.

      What is the length of side B ?

    • profile image


      7 years ago

      Wow, great explanation of how to use the formula and determine the coefficients from the raw data.


    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at:

    Show Details
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the or domains, for performance and efficiency reasons. (Privacy Policy)
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
    ClickscoThis is a data management platform studying reader behavior (Privacy Policy)