Improving the speed of Funq

Developer
Mar 15, 2009 at 12:59 AM
I've updated the Performance sample to run with the latest version of Funq and was disappointed to see it wasn't faster than Unity.  I went looking for optimizations and discovered an improvement to ServiceKey.  I've created an Issue in the Issue Tracker for this.

I may experiment with alternate collection implementations that may speed up the lookup process.  Does anyone else have some ideas?

Matthew
Coordinator
Mar 15, 2009 at 5:31 AM
Mmm... interesting. Have to keep an eye on that!

Are you planning on using a profiler to diagnose?

Thanks a lot for looking this one out

Developer
Mar 15, 2009 at 2:07 PM
I just used your Performance sample.  I did use the VS Performance Wizard but didn't get much info.  Need to instrument the Funq.dll as well.

An interesting challange and I keep at it.  I think I'll keep notes and blog the results.
Developer
Mar 16, 2009 at 12:02 AM
while watching golf I used the profiler on the code.  Made the following changes in order of performance issues
  1. Used a HybridDictionary instead of the generic dictionary for service in the Container class
  2. Improved the Equals methods
  3. changed the properties on ServiceKey and ServiceEntry to public fields

the final results are:

Running 1000 iterations for each use case.
Plain no-DI:         54539
Autofac:          2150495
Ninject:           9686360
StructureMap:   829774
Unity:              1682773
Funq:              1826574

Mar 16, 2009 at 12:13 AM
That's interesting. Doesn't StructureMap use reflection? How is it so fast?

From: [email removed]
Sent: Monday, March 16, 2009 10:02 AM
To: [email removed]
Subject: Re: Improving the speed of Funq [funq:50229]

From: mdennis

while watching golf I used the profiler on the code. Made the following changes in order of performance issues
  1. Used a HybridDictionary instead of the generic dictionary for service in the Container class
  2. Improved the Equals methods
  3. changed the properties on ServiceKey and ServiceEntry to public fields

the final results are:

Running 1000 iterations for each use case.
Plain no-DI: 54539
Autofac: 2150495
Ninject: 9686360
StructureMap: 829774
Unity: 1682773
Funq: 1826574

Coordinator
Mar 16, 2009 at 12:26 AM

Nop, in order for the comparison to be fair, I'm using the most efficient registration for each container, and they all have a way of registering with lambdas...

Still, it's surprising how we got on par with Unity from the point we were just about a month ago in the screencast... need to investigate!

Also, the improvements didn't seem to get us back on top (and by a relevant margin)  :(

/kzu from mobile

On Mar 15, 2009 8:13 PM, "mabster" <notifications@codeplex.com> wrote:

From: mabster

That's interesting. Doesn't StructureMap use reflection? How is it so fast?

From: [email removed]
Sent: Monday, March 16, 2009 10:02 AM
To: [email removed]
Subject: Re: Improving the speed of Funq [funq:50229]

From: mdennis while watching golf I used the profiler on the code. Made the following changes in ...

Read the full discussion online. To add a post to this discussion, reply to this email (funq@disc...

Developer
Mar 16, 2009 at 5:41 AM
In the performance sample, the Registration time isn't measured, only the Resolve time.  Are all DI configured for the equivalent of Reuse.None and Owner.External?
If I turn on Reuse.Hierarchy then the Funq time goes down to about 3 times the plain no-DI time.
Coordinator
Mar 16, 2009 at 1:12 PM
they are all registered for no reuse, yes. the objects being created are not IDisposable, so Owner shouldn't make a difference.
turning on reuse causes caching of the instance, which is what we're trying not to show there (as the no-DI scenario doesn't have an equivalent, other than direct access to a cached instance in a dictionary maybe?). maybe we should show that too separately?

/kzu
Developer
Mar 17, 2009 at 3:11 PM
Edited Mar 17, 2009 at 3:13 PM
After trying to speed up Funq and not getting much more improvement, I decided to try rewriting it with performance in mind.  My result call Munq takes only 10% of the time to execute the performance test sample, and only 2-3 times the Plain-DI case.  The only major restriction is that I don't allow passing additional parameters on Registration or Resolve.  Who really uses this feature anyways?

I used a lot of Funq design and some code.  Eliminated the non-Generic ServiceEntry.
You can get the source here.

 

Monq

A very simple Dependency Injection (DI) container based on Funq.

Requirements

  1. No Reflection
  2. Fast
  3. Resolve does not allow passing of parameters
  4. Use Lambda expressions to define method to create instance
  5. Use Generics to express Interface or Class to instantiate
  6. Parent-Child container hierarchy
  7. Container or External ownership

Performance Test

Running 1000 iterations for each use case.

DI Container Duration for 1000 iterations
Plain no-DI 29534
Funq 919207
Munq 79579
Autofac 1126281
Ninject 5294943
StructureMap 452011
Unity 878690

Developer
Mar 17, 2009 at 3:20 PM
For the Performance sample, the MunqUseCase is identical to the FunqUseCase.  Just change all Funq to Munq and add a reference to the Munq.dll (Release).
Coordinator
Mar 17, 2009 at 3:42 PM
Awesome!!!
Question: do you think the passing of arguments is the cause of the speed decrease in Funq?
I think the feature should not have an associated cost unless you actually use it, so I'll give it a second look.
How about I make you a contributor on Funq and we create a branch for Munq and try to apply its principles and come up with the greatest and fastest DI ever? :))
Maybe there's a way by which we can incrementally "opt-in" features without paying the cost up-front for those you don't use? (maybe a different project altogether with different impl. strategies and public API built-in? kind of Funq.dll, Funq.Advanced.dll, etc.?)

/kzu

--
Daniel Cazzulino | Developer Lead | XML MVP | Clarius Consulting | +1 425.329.3471


On Tue, Mar 17, 2009 at 11:20 AM, mdennis <notifications@codeplex.com> wrote:

From: mdennis

For the Performance sample, the MunqUseCase is identical to the FunqUseCase.  Just change all Funq to Munq and add a reference to the Munq.dll (Release).

Read the full discussion online.

To add a post to this discussion, reply to this email (funq@discussions.codeplex.com)

To start a new discussion for this project, email funq@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com


Developer
Mar 17, 2009 at 4:24 PM
I would love to be a contributer.  I seem to have become intrigued with this problem/project.

I think the reason Funq is slow is the cost of creating the delegate to pass the parameters to the factory.  Since Munq does not have to create this delegate and can call the factory directly, it does not incur this cost.

In otherwords, its ok for Registeration to be a liitle slow, but Resolution should be blindingly fast.
Coordinator
Mar 17, 2009 at 8:26 PM
I'm confused... so the slowness of funq is in the registration? or the resolution? (I guess the latter, 'cause of the intermediate lambda...).

You're a contributor now!
Please go ahead and create a branch inside your branches\mdennis area :)

thanks!

/kzu

--
Daniel Cazzulino | Developer Lead | XML MVP | Clarius Consulting | +1 425.329.3471


On Tue, Mar 17, 2009 at 12:24 PM, mdennis <notifications@codeplex.com> wrote:

From: mdennis

I would love to be a contributer.  I seem to have become intrigued with this problem/project.

I think the reason Funq is slow is the cost of creating the delegate to pass the parameters to the factory.  Since Munq does not have to create this delegate and can call the factory directly, it does not incur this cost.

In otherwords, its ok for Registeration to be a liitle slow, but Resolution should be blindingly fast.

Read the full discussion online.

To add a post to this discussion, reply to this email (funq@discussions.codeplex.com)

To start a new discussion for this project, email funq@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com


Developer
Mar 17, 2009 at 9:11 PM
Its in the Resolution.  It really doesn't matter how long Registration takes (up to a point), its the Resolve function that gets called over and over again, especially in a Web App.

I create a branch.  I think I know how to get around the extra layer of delegate.  It might be a little slower for Resolve with parameters, but not much if it works :)
Matthew
Coordinator
Mar 17, 2009 at 9:18 PM
keep me posted!

/kzu

--
Daniel Cazzulino | Developer Lead | XML MVP | Clarius Consulting | +1 425.329.3471


On Tue, Mar 17, 2009 at 5:11 PM, mdennis <notifications@codeplex.com> wrote:

From: mdennis

Its in the Resolution.  It really doesn't matter how long Registration takes (up to a point), its the Resolve function that gets called over and over again, especially in a Web App.

I create a branch.  I think I know how to get around the extra layer of delegate.  It might be a little slower for Resolve with parameters, but not much if it works :)
Matthew

Read the full discussion online.

To add a post to this discussion, reply to this email (funq@discussions.codeplex.com)

To start a new discussion for this project, email funq@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com


Developer
Mar 18, 2009 at 2:38 AM
Checked in Munq under my branch.  I included the Performance sample under the core as I use it to judge performance impact of changes.

Under samples I included my version of Stephen Walther's ContactManager sample, modified to use Munq.  Two files involved, global.asx and MunqControllerFactory.cs

I'm going to see about adding parameters.
Developer
Mar 18, 2009 at 4:11 AM
Edited Mar 18, 2009 at 4:17 AM
(edited.  did not set reuse=none, but still FASTEST DI)

Very Interesting!!!
The Performance program has a bug. 
You need to run more than one pass of the tests.  On the first pass, the DLLs are loaded and add to the time.
The second and subsequent passes have different and lower values.

usage: Performance.exe numerOfIterations
Defaulting to 1000 iterations
Running 1000 iterations for each use case.
Plain no-DI: 137893
Funq: 1110457
Munq: 171124
Autofac: 1465551
Ninject: 7915695
StructureMap: 460789
Unity: 887794
'R' to rerun, Press any other key to exit.
Running 1000 iterations for each use case.
Plain no-DI: 4294
Funq: 830455
Munq: 84919
Autofac: 920800
Ninject: 4109211
StructureMap: 274429
Unity: 693820
'R' to rerun, Press any other key to exit.
Running 1000 iterations for each use case.
Plain no-DI: 4475
Funq: 1137381
Munq: 83580
Autofac: 911963
Ninject: 5130095
StructureMap: 289894
Unity: 735667
'R' to rerun, Press any other key to exit.
Coordinator
Mar 18, 2009 at 11:11 AM

Wow!

Are u sure no caching of instances is happening? Is the full test suite (save the args) passing w/munq?

/kzu from mobile

On Mar 18, 2009 12:11 AM, "mdennis" <notifications@codeplex.com> wrote:

From: mdennis

Very Interesting!!!
The Performance program has a bug. 
You need to run more than one pass of the tests.  On the first pass, the DLLs are loaded and add to the time.
The second and subsequent passes have different and lower values. Munq is faster than Plain!!?? Have to check this out!!!

usage: Performance.exe numerOfIterations
Defaulting to 1000 iterations

Running 1000 iterations for each use case.

Plain no-DI: 138484
Funq: 1050408
Munq: 86870
Autofac: 1227461
Ninject: 5685854
StructureMap: 468921
Unity: 978063
'R' to rerun, Press any other key to exit.

Running 1000 iterations for each use case.

Plain no-DI: 4165
Funq: 808337
Munq: 2829
Autofac: 924765
Ninject: 4172769
StructureMap: 282948
Unity: 676295
'R' to rerun, Press any other key to exit.

Running 1000 iterations for each use case.

Plain no-DI: 4307
Funq: 810051
Munq: 2730
Autofac: 925984
Ninject: 4112949
StructureMap: 274549
Unity: 700091
'R' to rerun, Press any other key to exit.

Running 1000 iterations for each use case.

Plain no-DI: 4706
Funq: 853184
Munq: 2711
Autofac: 901787
Ninject: 4248691
StructureMap: 296767
Unity: 769950
'R' to rerun, Press any other key to exit.

Running 1000 iterations for each use case.

Plain no-DI: 5137
Funq: 856990
Munq: 2724
Autofac: 945415
Ninject: 4628935
StructureMap: 295524
Unity: 719453
'R' to rerun, Press any other key to exit.

Running 1000 iterations for each use case.

Plain no-DI: 4393
Funq: 819837
Munq: 2723
Autofac: 924262
Ninject: 4115018
StructureMap: 288831
Unity: 679926
'R' to rerun, Press any other key to exit.

Running 1000 iterations for each use case.

Plain no-DI: 4363
Funq: 812646
Munq: 2710
Autofac: 913286
Ninject: 4210671
StructureMap: 257122
Unity: 680419
'R' to rerun, Press any other key to exit.

Running 1000 iterations for each use case.

Plain no-DI: 4367
Funq: 813263
Munq: 2708
Autofac: 921193
Ninject: 4100516
StructureMap: 262944
Unity: 690422
'R' to rerun, Press any other key to exit.

Read the full discussion online. To add a post to this discussion, reply to this email (funq@discu...

Developer
Mar 18, 2009 at 1:31 PM
Edited Mar 18, 2009 at 1:35 PM
I edited my previous post.  I had Reuse=Hierarchy by mistake (I changed the container defaults to match Funq without updating the Performance sample code).

The updated entry is slower, but still 10 times faster than Funq :).

I haven't checked in the correction yet, as I am working on passing parameters.  Should be possible, just requires some refactoring.

Matthew
Developer
Mar 18, 2009 at 9:23 PM
Checked in version that handles one parameter on the Resolve.

Running 10000 iterations for each use case.
Plain no-DI: 48767
Funq: 8597694
Munq: 851912
Autofac: 9202607
Ninject: 40886671
StructureMap: 2558292
Unity: 6872139
'R' to rerun, Press any other key to exit.
Coordinator
Mar 18, 2009 at 10:07 PM
gosh, have to check what you're doing!
can't believe the difference :(

/kzu

--
Daniel Cazzulino | Developer Lead | XML MVP | Clarius Consulting | +1 425.329.3471


On Wed, Mar 18, 2009 at 5:23 PM, mdennis <notifications@codeplex.com> wrote:
597

Developer
Mar 19, 2009 at 12:06 AM
I've got the Resolve with parmeters working, ran the Unit Tests from Funq. All good and fast.
Checked in the source.

Haven't implemented the Lazy Load.  I don't quite understand the benefit of this.  In what senarios would you use this?

Matthew
Coordinator
Mar 19, 2009 at 12:10 AM

R u checking in in codeplex under your branch? Didn't get the bits earlier today :(

/kzu from mobile

On Mar 18, 2009 8:06 PM, "mdennis" <notifications@codeplex.com> wrote:

From: mdennis

I've got the Resolve with parmeters working, ran the Unit Tests from Funq. All good and fast.
Checked in the source.

Haven't implemented the Lazy Load.  I don't quite understand the benefit of this.  In what senarios would you use this?

Matthew

Read the full discussion online. To add a post to this discussion, reply to this email (funq@disc...

Developer
Mar 19, 2009 at 12:41 AM
Yes, I'm checking in under my branch.
Coordinator
Mar 19, 2009 at 1:38 AM

Weird...

Anyway, looks like you'll have to redo the 9 screencasts :p

/kzu from mobile

On Mar 18, 2009 8:41 PM, "mdennis" <notifications@codeplex.com> wrote:

From: mdennis

Yes, I'm checking in under my branch.

Read the full discussion online. To add a post to this discussion, reply to this email (funq@disc...

Developer
Mar 19, 2009 at 1:51 AM
Did a little refactoring.  Slight slowdown (1-2%), but reads better and the responsibilities are in the right place.
Code checked in.

mdennis
Coordinator
Mar 19, 2009 at 1:53 AM

Wouldn't it be awesome to do a screencast on refactoring together so we walk through the changes to get from the current trunk to yours? ;)

/kzu from mobile

On Mar 18, 2009 9:51 PM, "mdennis" <notifications@codeplex.com> wrote:

From: mdennis

Did a little refactoring.  Slight slowdown (1-2%), but reads better and the responsibilities are in the right place.
Code checked in.

mdennis

Read the full discussion online. To add a post to this discussion, reply to this email (funq@disc...

Developer
Mar 19, 2009 at 2:47 AM
Sounds like a great idea.  I'm not quite sure of the logistic, but we should be able to work it out.  I'm bust for the next few days, but will try to find the time to put together some notes on what and why I did what I did.

Man, can Generics get a little confusing if not handled carefully.

Matthew

MSN Messenger ID: matthew.dennis50@hotmail.com
Developer
Mar 20, 2009 at 3:23 AM
Just checked in a new version.  Cleaned up some formatting and refactored ServiceEntry.  Gained another 10%.

Running 10000 iterations for each use case.
Plain w/Reuse:         7604     -|
Funq Hierarchy:    649314     |- added to compare with Resuse
Munq Hierarchy:     42300   -|

Plain no-DI:        67836
Funq:            10253715
Munq:              785778
Autofac:         26006383
Ninject:         47883775
StructureMap:     2681709
Unity:            7241285

'R' to rerun, Press any other key to exit.

Coordinator
Mar 20, 2009 at 11:11 AM

Did you add the LazyResolve too? It's quite trivial to implement and a very useful feature

/kzu from mobile

On Mar 19, 2009 11:23 PM, "mdennis" <notifications@codeplex.com> wrote:

From: mdennis

Just checked in a new version.  Cleaned up some formatting and refactored ServiceEntry.  Gained another 10%.

Running 10000 iterations for each use case.

Plain w/Reuse:         7604     -|
Funq Hierarchy:    649314     |- added to compare with Resuse
Munq Hierarchy:     42300   -|

Plain no-DI:        67836
Funq:            10253715
Munq:              785778
Autofac:         26006383
Ninject:         47883775
StructureMap:     2681709
Unity:            7241285

'R' to rerun, Press any other key to exit.

Read the full discussion online.

To add a post to this discussion, reply to this email (funq@discussions.codeplex.com) To start a ...

Developer
Mar 20, 2009 at 12:10 PM
I've tried a bunch of 'optimizations' and refactorings to optimize the no parameter case.  Everything I did made it worse.  I don't think there is much improvement possible for the current code.

I haven't implemented LazyLoad (yet) as I'm not quite sure of where and why it would be used.  It would be slow due to the need to create a delegate.  That said, it wouldn't affect the speed of the other code.
Developer
Mar 20, 2009 at 12:41 PM
I understand LazyResolve now.  Added the code, tested, and checked in :)
Developer
Mar 20, 2009 at 6:39 PM
Found a bug in LazyResolve.  The LazyResolve shouldn't check for the existance of the type registration as it may be registered after the call to LazyResolve.
When the returned func is called, it will throw if the Resolve fails.
Coordinator
Mar 20, 2009 at 6:45 PM

I'd rather have it fail early, that's why I check for the registration.

/kzu from mobile

On Mar 20, 2009 2:39 PM, "mdennis" <notifications@codeplex.com> wrote:

From: mdennis

Found a bug in LazyResolve.  The LazyResolve shouldn't check for the existance of the type registration as it may be registered after the call to LazyResolve.
When the returned func is called, it will throw if the Resolve fails.

Read the full discussion online. To add a post to this discussion, reply to this email (funq@disc...

Developer
Mar 21, 2009 at 1:48 PM
I understand. And as I think of it your reasoning makes sense. 
By the time the LazyResolve is called, all the registrations should be done. 
I'll make the change, its quick.
Coordinator
Mar 27, 2009 at 4:49 AM
Hi Matthew,
I finally sat down and took a good look at your branch.
For the record, I must say that right from the beginning, I didn't find Funq performing nearly as "bad" as you reported originally (it performed consistently 4-5x faster than Unity).

I did find that the perf. program was not doing a GC.Collect() between use cases, so that might have skewed some runs. I checked that in. Also, I noticed (weird enough) that around your 10k iterations number, it gets a bit slower, recovering if you go past that number :|. No idea why, but for example, at 1k iterations, Funq is between 3-4x no DI (consistent with your Munq numbers, after I added some optimizations, see later), increases a bit around 5k iterations, reaching what looks to be a max at around 10k iterations, and going down again to the original speed at for example 20k iterations. This is interesting, and I might run a more comprehensive sample across the spectrum for all frameworks and dump that to a .csv so we can graph it and see what's going on.

Ideas I took and implemented in the trunk:

1 - Change service key and entry to use fields
2 - Refactor service entry to manage initialization (cleaned up a bit the way it talks back to the container to track disposables)
3 - Refactored service entry to expose typed factory (to be used by 4.)
4 - Removed my generic factory invocation code which involved one more delegate creation/invocation upon object creation.

Thanks a lot for proposing these changes through your Munq!
The perf. has indeed increased significantly, and the code looks better now :)
Here are the numbers before and after the refactorings from 2-4:

// Before initialization refactoring
Running 1000 iterations for each use case.
Plain no-DI: 10222
Autofac: 209477
Ninject: 998059
StructureMap: 110605
Unity: 194795
Funq: 41351 (4x no DI)

// After initialization refactoring and removing factory delegate
// Now we have multiple resolveimpl for speed.
Running 1000 iterations for each use case.
Plain no-DI: 10573
Autofac: 221120
Ninject: 1000663
StructureMap: 105897
Unity: 200031
Funq: 33667 (3.1x no DI!)


I didn't change Dictionary<> with HybridDictionary as I didn't see a perf. improvement at all (it actually decreased in quite a few runs).

Thanks again!

/kzu


--
Daniel Cazzulino | Developer Lead | XML MVP | Clarius Consulting | +1 425.329.3471
Coordinator
Mar 27, 2009 at 5:30 AM
And btw, make sure to check the new perf. awesomeness at http://www.tinyurl.com/diperformance

/kzu

--
Daniel Cazzulino | Developer Lead | XML MVP | Clarius Consulting | +1 425.329.3471

Developer
Mar 27, 2009 at 10:22 PM
I'll check out why my Funq was so slow.  Probably used a debug version by mistake.

I'm eager to see what you did.  Sounds similar to what I was doing or working on.

I'm not sure if the version I checked in had separate ServiceKey types for named and unnamed.  Improves the equality checking a little.  Not much, but it is called alot.

Matthew
Coordinator
Mar 27, 2009 at 11:18 PM

Mmm... but being a Dictionary<>, I'd say the comparison is based on the hashcodes, which are calculated only once? :|

/kzu from mobile

On Mar 27, 2009 6:22 PM, "mdennis" <notifications@codeplex.com> wrote:

From: mdennis

I'll check out why my Funq was so slow.  Probably used a debug version by mistake.

I'm eager to see what you did.  Sounds similar to what I was doing or working on.

I'm not sure if the version I checked in had separate ServiceKey types for named and unnamed.  Improves the equality checking a little.  Not much, but it is called alot.

Matthew

Read the full discussion online. To add a post to this discussion, reply to this email (funq@disc...