Tuesday, April 28, 2009

Calling amateur mathematicians

Given a set of x-y values like the above, where X=0%, Y=0%*, and at X=100%, Y=infinity.

How do I create a function in Excel to fit a trendline for this data?  The answer will power a cool analysis that I've been putting together.

Would it be better if I turned it into a Lorenz Curve by indexing the Y-values?

* Yes, I know (0,0) is not plottable on a log chart.  It's not about plotting it on a chart, it's about establishing a function to fit this curve.


Consultant Insider said...

Try y = -7/(x-1)

Seems to fit pretty well. Don't know helpful it will be though.

akznetsov said...

Try these:

1) y = 0,3115exp(0,5227x)
(R2 = 0,9595)

2) y = 0,0398x6 - 1,6883x5 + 27,6x4 - 217,62x3 + 847,63x2 - 1480,7x + 851,89
(R2 = 0,9901)

Fits good but for the practical use... it depends


Steve Shu said...

akznetsov looks like he is on the right track. #1 looks better to me (presuming the coefficients are right - seems so since R squared value was given). #2 seems like it could be an overfit regression. Maybe try the regression with fewer independent variables as opposed to 6.

Steve Shu said...

Also, I don't know if you need to get a plug-in for Excel to do the regression analysis. See this link as one possible source.


I am on the new Excel. I can barely figure out how to format cells anymore, the interface is so different.

Mason said...
This comment has been removed by the author.
Mason said...

From the looks of the data I can see (assumptions may be violated/need to be checked based on the data sources), I have a few suggestions.

The most immediate models that come to mind are:

I would also try a piecewise regression--perhaps try a cubic until a point k, and then try either of the log models for >k.

As far as the Lorenz curve...I don't think I can give an intelligent answer without knowing the nature of the data, but I would only try it if the other models fail.

Hope that helps! I enjoy reading your blog :)

Consultant Ninja said...


Consultant Ninja said...

The data is IRS tax data.


X-value: Percentile of the US population below a certain income.

Y-value: That income value.

Interesting stuff here. I'll have to chew through it. Thanks.

Tuesday, April 28, 2009