On PSR-0 standards, namespaces and code (re-)use

When it comes to working on several projects with different people, having a set of standards to dictate code use and code re-use is a good thing to have. PSR-0 is one such accepted standard.

It took a while to realize it, as I so often do, that many useful improvements to PHP mean a few steps forward, a few steps back. I’m going to try to list a few patterns which are the cause of some conflict when implementing PSR-0.

Global variables are bad

It’s a widely accepted programming pattern in PHP that global variables are bad. Using global variables is a so called anti-pattern, it’s use greatly discouraged and frowned upon. From what I remember, the main cause of this being discouraged over time was because of the “register_auto_globals” option PHP had offered. Using it was the cause of many notable security flaws in software all around.

Many other reasons exists for the case why you shouldn’t use global variables - but there is a need to share objects and variables between various places in any code base. Avoiding global variables makes this kind of sharing significantly clearer and comes with some other benefits too.

Static class methods

Using static class methods is a way to avoid using global variables. Instead of storing your objects, configuration and what-have-you in the global scope, you could store it in a short named class as a static value. This gives you the benefit of using “config::get(‘language’)” or “registry::get(‘database_connection’)”. Having a minimal and solid implementation of common and widely used static objects is highly expressive and useful.

Namespaces and PSR-0

According to the PSR-0 document, the classes must be declared as [<Vendor Name>(<Namespace>)<Class Name>]. This encourages code re-use, but it only does this by actually encouraging instances - with little or no thought given to static objects that could be shared across your whole project, like a global function or a static container class would be.

The kludge to work around this is to declare a minimal class in your project that just extends the full namespaced PSR-0. The other ways have significant caveats. I say kludge because this appears to be the only way to create a global class scope with namespaces.

The example [\vendor\namespace\config::get(“language”)] call illustrates how “hard” it would be to use a name-spaced static object across your project. You can always fall back to creating an alias with [use \vendor\namespace\config;] and use [config::get] as you did before - but with having to declare usage in every file you do this.

Ideally, a way to declare a global class alias would be a “clean-er” way to solve this; but as far as I know, this steps further away from traditional namespaces in other languages. Changes like this are something the PHP developers have been strongly against from the start - and I don’t expect that to change any time soon.

Final thoughts

The PSR-0 is a seminal document in the way it encourages code re-use, but falls short when it comes to actual shared use of this code. Absolutely no thought has been given to static-access patterns with PSR-0, and I don’t see a nice way of reconciling this because of issues inherent with the implementation of PHP namespacing, especially it’s aliasing features. PHP is trying too hard to be “enterprise” to realize that being “enterprise” actually means abandoning the simplicity and expressiveness that are, in my eyes, a few of the reasons that PHP became so popular in the first place.

Many projects don’t have a share-nothing architecture which PSR-0 encourages. If you’re considering adopting PSR-0 as your standard of choice, you must do so with some thought - the stylistic parts of the higher numbered PSR-[1,2] standards afford some benefits when working with several programmers in teams, but implementing PSR-0 comes with some significant caveats that imply that some common practices in the PHP community are now considered anti-patterns.

Composer and Packagist

If you’re still considering using PSR-0 or even if you’re not, there are a few projects that make your life a little bit better. You can use composer to provide an update mechanism for your shared code between your projects. You can use public Packagist repositories, or you can create your own with any of these: Packagist,  Satis or my own Composer-sentinel.

May all your code be great.

- Tit Petric

Batch resolving of promises
I tend to have a lot of development ideas stemming from repetitive workloads or from an optimization standpoint. I tend to obsess over inefficient code structures in both. I&#8217;ve literally had dreams that provided me with answers which I implemented during the day. If only we could code at night during sleep. In retrospective, the Pareto principle applied to that subconsciously-influenced code base, meaning 80% of it&#8217;s usage was fine, and 20% was outside of the scope I was trying to solve and introduced other problems. More about that some other time.
We have a fairly complex setup over at RTV Slovenia. The landing page, which takes the majority of all requests, is constructed from a variety of data sources. There are news items, comment counts, menu items, static content, recent items in the social section of the web site, video news and a lot of relationships that make the whole mess the most complex part of the web site by a wide margin.
There is little chance of rewriting it. But it&#8217;s an interesting logical problem on how to optimize it without throwing it away. One of the common optimization techniques we use is that we group our data together, when possible. Given the traditional programming flow, this is sometimes quite tedious to optimize globally, instead the optimizations happen on lower levels - display items for one section, fetch all items, fetch all related comment counts, fetch all related videonews,&#8230; But we display about 10 sections of new items. We could fetch all news items in one bulked request, but it would take some significant refactoring. And there&#8217;s still the other data sources we would need to worry about.
The concept of Futures and promises seems to be a good solution to this model. A Promise is defined as a deferred Value. In practice this is an object which value can be resolved at a later time. Seems perfect, all we need to add to extend this model is:
1) Promise relationships2) Batch resolving of multiple promises
When I say &#8220;relationships&#8221; I&#8217;m trying to approach this from a data driven standpoint, and not program flow per-se. I don&#8217;t want to use the then() keyword to trigger resolving of promises, and I don&#8217;t want to keep track of the promises in a sea of closures when they resolve.
A Promise containing more Promises, containing more Promises is a good way to specify relationships between promises. It&#8217;s not that simple, you still need some program flow when creating promises, but this is nothing that can&#8217;t be solved with a getPromises() method and some recursion. A Promise defines a set of Promises to be resolved. A news item would define a comment count promise and return it here.
We stray from the traditional use of Promises here. Using Promise objects in this way gives us the ability of batch resolving promises whereas you don&#8217;t get that ability from using Promises when you&#8217;re implementing common Promises programming patterns. And we&#8217;re maintaining the data relationships between them.
All that is left to do it resolve the promises in the final data tree. My approach was to traverse the tree and reference it by class name in a final list. This way it was possible to resolve a list of promises using a resolveAll($promises) method defined in a specific promise class. This is the batching function which takes all the promises of the same type and resolves them using one function call. This function takes care of fetching the data and resolving promises. You would do this in MySQL by using a query with the SET type, or you could use memcache::get or redis::mget.
You can check out my attempt at a solution here:
https://github.com/titpetric/research-projects/tree/master/php-resolve
So, while the landing page would still need significant refactoring, this is a step in the right direction. The resulting data tree is perfect because it is resolved with no data duplication and the maximum amount of batching. Whatever data source you use, chances are it would only add one SQL query to get all results. And optimizing one SQL query call is much easier as having to optimize 20 of them over your complete application stack. It is also so nice to reduce the number of SQL queries you&#8217;re working with in case you need to implement sharding, moving the database or some other data management changes.
Additional thoughts:
The approach is sequential and you&#8217;re given your data tree directly after execution of the resolve / resolveAll calls. There exists an opportunity to fetch data asynchronously, depending on the source of your data. If you&#8217;re consuming API responses over HTTP, SQL queries over MySQL or any kind of data over a non-blocking connection, the resolving could be adapted to take advantage of this.
Fetching the data in such a way is a nice optimization, but it needs to be implemented over your complete MVC solution to really take advantage of the benefits. The goal is to come as close to possible to complete coverage, so none of your data calls get duplicated.
There is some thought that needs to be put into how your MVC framework can live with this data model, and where it should be avoided. The thing to keep in mind is, that this is basically an efficient model for fetching data while keeping relationships between data. This is somewhat a superset of DAO / DAL logic, since it approaches this data from a global viewpoint, and not a specific data structure viewpoint.
p.s. a significant pitfall here is also the PHP engine. I&#8217;m sure the performance could/would increase dramatically if this was running in a JVM. While the benchmark is not bad, the 95th percentile shows significant overhead in the initial runs, before PHP does some of it&#8217;s pre-allocation magic to speed things up.
- Tit Petric

Batch resolving of promises

I tend to have a lot of development ideas stemming from repetitive workloads or from an optimization standpoint. I tend to obsess over inefficient code structures in both. I’ve literally had dreams that provided me with answers which I implemented during the day. If only we could code at night during sleep. In retrospective, the Pareto principle applied to that subconsciously-influenced code base, meaning 80% of it’s usage was fine, and 20% was outside of the scope I was trying to solve and introduced other problems. More about that some other time.

We have a fairly complex setup over at RTV Slovenia. The landing page, which takes the majority of all requests, is constructed from a variety of data sources. There are news items, comment counts, menu items, static content, recent items in the social section of the web site, video news and a lot of relationships that make the whole mess the most complex part of the web site by a wide margin.

There is little chance of rewriting it. But it’s an interesting logical problem on how to optimize it without throwing it away. One of the common optimization techniques we use is that we group our data together, when possible. Given the traditional programming flow, this is sometimes quite tedious to optimize globally, instead the optimizations happen on lower levels - display items for one section, fetch all items, fetch all related comment counts, fetch all related videonews,… But we display about 10 sections of new items. We could fetch all news items in one bulked request, but it would take some significant refactoring. And there’s still the other data sources we would need to worry about.

The concept of Futures and promises seems to be a good solution to this model. A Promise is defined as a deferred Value. In practice this is an object which value can be resolved at a later time. Seems perfect, all we need to add to extend this model is:

1) Promise relationships
2) Batch resolving of multiple promises

When I say “relationships” I’m trying to approach this from a data driven standpoint, and not program flow per-se. I don’t want to use the then() keyword to trigger resolving of promises, and I don’t want to keep track of the promises in a sea of closures when they resolve.

A Promise containing more Promises, containing more Promises is a good way to specify relationships between promises. It’s not that simple, you still need some program flow when creating promises, but this is nothing that can’t be solved with a getPromises() method and some recursion. A Promise defines a set of Promises to be resolved. A news item would define a comment count promise and return it here.

We stray from the traditional use of Promises here. Using Promise objects in this way gives us the ability of batch resolving promises whereas you don’t get that ability from using Promises when you’re implementing common Promises programming patterns. And we’re maintaining the data relationships between them.

All that is left to do it resolve the promises in the final data tree. My approach was to traverse the tree and reference it by class name in a final list. This way it was possible to resolve a list of promises using a resolveAll($promises) method defined in a specific promise class. This is the batching function which takes all the promises of the same type and resolves them using one function call. This function takes care of fetching the data and resolving promises. You would do this in MySQL by using a query with the SET type, or you could use memcache::get or redis::mget.

You can check out my attempt at a solution here:

https://github.com/titpetric/research-projects/tree/master/php-resolve

So, while the landing page would still need significant refactoring, this is a step in the right direction. The resulting data tree is perfect because it is resolved with no data duplication and the maximum amount of batching. Whatever data source you use, chances are it would only add one SQL query to get all results. And optimizing one SQL query call is much easier as having to optimize 20 of them over your complete application stack. It is also so nice to reduce the number of SQL queries you’re working with in case you need to implement sharding, moving the database or some other data management changes.

Additional thoughts:

The approach is sequential and you’re given your data tree directly after execution of the resolve / resolveAll calls. There exists an opportunity to fetch data asynchronously, depending on the source of your data. If you’re consuming API responses over HTTP, SQL queries over MySQL or any kind of data over a non-blocking connection, the resolving could be adapted to take advantage of this.

Fetching the data in such a way is a nice optimization, but it needs to be implemented over your complete MVC solution to really take advantage of the benefits. The goal is to come as close to possible to complete coverage, so none of your data calls get duplicated.

There is some thought that needs to be put into how your MVC framework can live with this data model, and where it should be avoided. The thing to keep in mind is, that this is basically an efficient model for fetching data while keeping relationships between data. This is somewhat a superset of DAO / DAL logic, since it approaches this data from a global viewpoint, and not a specific data structure viewpoint.

p.s. a significant pitfall here is also the PHP engine. I’m sure the performance could/would increase dramatically if this was running in a JVM. While the benchmark is not bad, the 95th percentile shows significant overhead in the initial runs, before PHP does some of it’s pre-allocation magic to speed things up.

- Tit Petric

When told that the server with 2.5 years uptime was rebooted.

devopsreactions:

image

by Tuwi

# ssh falcon.mmc.lan uptime
 12:50:03 up 2038 days, 22:15,  0 users,  load average: 0.08, 0.02, 0.01

We’re good :)

- Tit Petric

titpetric:

Wilson Miner - When We Build

We are tool builders. We make things. We make things that change our lives and we make things that change the world. This is a long lasting tradition. We make our tools and our tools shape us.

Samuel L. JSON

Samuel L. JSON

Sometimes it&#8217;s not about just optimizing CPU time away. Looking at the details I could optimize away a badly written SQL query along with some more trivial big-O problem in regard with sorting of video clips which I got out of serialized data. All from a few carefully placed calls. Reducing network bandwidth is just as important as CPU time.
- Tit Petric

Sometimes it’s not about just optimizing CPU time away. Looking at the details I could optimize away a badly written SQL query along with some more trivial big-O problem in regard with sorting of video clips which I got out of serialized data. All from a few carefully placed calls. Reducing network bandwidth is just as important as CPU time.

- Tit Petric

In process performance statistics with Redis

Every once in a while you need to do a sanity check of your code, how it performs and what you can do to improve it. This is most apparent with a code-base that is developed and refactored over a period of several years. There can be several problems that negatively impact performance due to dependency issues or legacy code that should have been removed but was forgotten during a refactoring sprint.

In effect, everybody is looking for low hanging fruit. Perhaps a slow performing SQL query, or just some processing which is taking O(P*N*M) due to sloppy code. The trick is to figure out where your code is wasting time. Sure, you can have profiled code with tools like xhprof, but the inherent verbosity can leave you with a mess of data which you’ll never get around to analyze. I had gigabytes of profiler logs to prove it.

Lucky for me, I use check points in our own custom CMS software. Each time a specific core method or functionality is being used, a function gets called along with some descriptive argument. Think of it as assertions, except there’s much less of them, up to about 30 per pageload, depending on the complexity of the page being loaded.

It took about 3 hours to write tooling to log this data into Redis. Or more specifically, since I am only interested in real-time data, I pushed each request and each check point call into a Redis channel (Pub/Sub). All that was left to do after that was write some tooling (colors, yay!) that listened to the data, processed it and tried to illustrate where the time was being spent. This was the final result.

As far as low hanging fruit goes, this was good enough to shave 40ms over all requests, due to some legacy code that was still being used. Basically we included a cache file with menu data, which wasn’t even used for a majority of requests. And the same cache file was loaded explicitly later in the execution, when the data was actually being used. The change, even if slight, is visible on each cluster machine.

And even looking at the output above I can see some clear performance problems in the “firstpage blocks” section, between admin_menu_dao and mojsplet_dao. A whole 0.34 seconds are getting wasted there, which represents about 55% of total time. I know the firstpage is a complex module, but I also know this could be shaved down to about 0.2 sec total without goat sacrifices.

I’m guessing there’s a lot of redundant processing and ineffective big-O calls in there, along with some recursion from what I remember. Either way, the affected code base in that area is perhaps 50-100kb, so it shouldn’t be hard to track down the badly performing code and optimize it away. Not bad for 3 hours of work and about a pint of coffee.

- Tit Petric

Don’t you just love those late night STP dilemmas? “If I enable this port, will I need to drive to the server room and plug out the cable?”

WhooPHPsy…

Recently we had an issue on one system. There were just too many API requests to be normal. Nothing critical, but as I was doing some other migrations it caught my eye and the paranoid I am, I could not let it go. When I realized that the traffic originated from “inside” (some application running in our cloud) the real head scratching begun. Then I pinpointed the application and figured out that there has to be something with caching (a low traffic page, that generates just too many API calls? you can see where this is going). But all the code looked ok! Sort of…

class CacheImpl extends Cache {

  function Cache() {
    parent::__construct();
    $this->_enabled = class_exists("Memcache");
  }

You see it, right? I didn’t for a while. And so didn’t the author, not to name any names ;)

Lesson: use __construct()

crudcomic:

Happy Halloween
Remember to backup your data, or your phone will ring at 2am and you’ll die seven days later - or worse, you’ll have to explain to your customers why their data is missing.

crudcomic:

Happy Halloween

Remember to backup your data, or your phone will ring at 2am and you’ll die seven days later - or worse, you’ll have to explain to your customers why their data is missing.

A brain dump of programmers tips, tricks and thoughts in relationship with programming, technology, life.

Written by:
Tit Petric
@titpetric
Damjan Cvetko
@damjancvetko
Peter Lavric
@peterlavric

view archive



Submit