Series: Software Design For Normal People

Writing simple, maintainable software.

  • Hacking PHP’s WeakMap for Value Object D×

    Hacking PHP’s WeakMap for Value Object D×

    This entry is part 1 of 2 in the series Software Design For Normal People

    This guide will allow you to use value objects in PHP without a special
    $obj->equals() method. Instead, you can use the
    === or == operator to compare two value objects. This guide will also show you how to use a WeakMap to ensure that you don’t
    create multiple instances of the same value object and how to hack WeakReferences to do the same when backing
    scalar values.

    The Base Value

    Before we get started, we need to figure out what our Value Objects will represent. In this example, we will use time as our value. Our values will be wrapped numbers, but they could also be strings or any other arbitrary value. If you have a specific, predefined set of numbers or strings, this guide isn’t for you; use enums instead. For something like time, a user should be able to say, “Are 5 seconds equal to 5,000 milliseconds?” The answer should be “Yes.” They should be able to do this without using special methods or using the == operator.

    Before we write any code, we need to decide on our base value. In this example, we will use milliseconds to keep things simple.

    So, let’s dive into some code! Here’s the basic structure of the class:

    readonly class Time
    {
        private int $milliseconds;
    
        public function asSeconds(): float {
            return $this->milliseconds / 1000;
        }
    
        public function asMilliseconds(): float {
            return $this->milliseconds;
        }
    
        public function asMinutes(): float {
            return $this->milliseconds / 60000;
        }
    }

    This class is pretty straightforward. It has a single property, $milliseconds, and three methods to convert the value to seconds, minutes, and milliseconds. We could add more methods, but this is enough for now.

    Creation

    To create proper value objects for our purposes,
    having direct access to the constructor would prevent us from being able to enforce the value’s identity since calling new Time(5000) and new Time(5000) would create two different objects, we need to use a factory.

    readonly class Time
    {
        private function __construct(private int $milliseconds) {}
    
        public static function fromMilliseconds(int $milliseconds): Time {
            return new Time($milliseconds);
        }
    
        public static function fromSeconds(float $seconds): Time {
            return new Time($seconds * 1000);
        }
    
        public static function fromMinutes(float $minutes): Time {
            return new Time($minutes * 60000);
        }
    
        /* ... */
    }

    Now, we can create a Time object like this:

    $time = Time::fromMilliseconds(5000);

    However, you’ll notice that Time::fromSeconds(1) !== Time::fromMilliseconds(1000), this is because we’re creating a new instance of Time every time we call a factory method.

    The Weak Map

    To solve this problem, we can use a WeakMap. A WeakMap is a map with weak references to its keys, meaning that if the key isn’t used anywhere, the WeakMap will remove the key and its value from the map.

    What we need to do is something like this:

    1. See if any Time objects have the same value in the WeakMap.
    2. If there are, return the existing object.
    3. If there aren’t, create a new Time object and store it in the WeakMap.

    The actual implementation would look more or less like this:

    function getValue(int $milliseconds): Time {
        static $weakmap;
        $weakmap ??= new WeakMap();
        return $weakmap[$millisecondes] ??= Time::fromMilliseconds($milliseconds);
    }

    However, you’ll note that you can’t exactly put scalar values in a WeakMap, as they’re not objects. We must be more creative and use an array to get around this. However, if we use an array, we must ensure we behave like a WeakMap. If we’re wrapping regular objects, then the above algorithm is all we need. However, if we’re wrapping scalar objects, we need an array of WeakReferences.

    Array of WeakReference

    As mentioned earlier, we must use an array of WeakReferences to store our values. So, let’s get started:

    1. Create a static array that we use to hold our wrapped values.
    2. Create a WeakReference for each wrapped value we store in the array.
    3. When we need a wrapped value, we check if the raw value is in the array.
    4. If it is, we return the wrapped value.
    5. If it isn’t, we create a new wrapped value from the raw value, store it in the array, and return the wrapped value.
    6. On destruction, remove the raw value from the array.

    Here’s the implementation with comments:

    class Time {
        private static array $map = [];
    
        private static function getValue(int $milliseconds): Time {
            // Attempt to get the value from the array, and if it exists, get the
            // value from the WeakReference, otherwise, create a new one
            $realValue = (self::$map[$milliseconds] ?? null)?->get() ?? new self($milliseconds);
    
            // Store the value in the array, even if another reference exists
            self::$map[$milliseconds] = WeakReference::create($realValue);
    
            return $realValue;
        }
    
        public function __destruct(){
            // The values no longer exist, and we can delete the value from the array
            unset(self::$map[$this->milliseconds]);
        }
    
        /** ... */
    }

    This implementation is a bit more complex than the WeakMap but still pretty straightforward. We create a new Time object if one doesn’t exist yet; otherwise, return the existing one. Upon destroying the object, we remove it from the array since the WeakReference is about to be empty.

    You may notice the odd way the “getting from the array” is structured and that we store it in a $realValue object. There are several reasons for this.

    1. This is actually pretty darn fast.
    2. We have to hold a reference to the created object, or it will be removed from our array before we ever return it.
    3. Just because we have a WeakReference, doesn’t mean it holds a value.

    Even with this knowledge, you might be tempted to rewrite it like so:

    $realValue = self::$map[$milliseconds] ?? WeakReference::create(new self($milliseconds);

    However, because there isn’t anything but the WeakReference holding a reference to new self(), it will immediately be deleted. The provided listing is an efficient way to extract a value if one exists and hold a reference to it in $realValue.

    Comparisons

    Using a consistent base value means relying on regular PHP comparisons without special magic. The way that PHP compares two objects is by inspecting their properties — if they are the same type. Since there is only one property and it is numeric, the following are true:

    Time::from(TimeUnit::Seconds, 1) > Time::from(TimeUnit::Milliseconds, 1)

    To simply “just work.”

    Thus, the only special operators we need are mathematical operators.

    Full Listing

    Before we get to the full listing, there are few more small improvements:

    1. We should make the class final. I’m usually against final classes, but we must do careful manipulations in this case, and a child class may override important behaviors which may cause memory leaks. I’ll do that now.
    2. We can use enums for units to simplify our from/to logic. I’ll go ahead and do this as well.
    3. I’ve updated my Time Library to use this.

    Here’s the full listing:

    enum TimeUnit : int {
        case Milliseconds = 1;
        case Seconds = 1000;
        case Minutes = 60000;
    }
    
    final class Time {
        private static array $map = [];
        
        private static function getValue(int $milliseconds): Time {
            // Attempt to get the value from the array, and if it exists, get the
            // value from the WeakReference, otherwise, create a new one
            $realValue = (self::$map[$milliseconds] ?? null)?->get() ?? new self($milliseconds);
            
            // Store the value in the array, even if another reference exists
            self::$map[$milliseconds] = WeakReference::create($realValue);
            
            return $realValue;
        }
        
        public function __destruct(){
            // The values no longer exist, and we can delete the value from the array
            unset(self::$map[$this->milliseconds]);
        }
        
        private function __construct(private readonly int $milliseconds) {}
        
        public static function from(TimeUnit $unit, float $value): Time {
            return self::getValue($unit->value * $value);
        }
        
        public function as(TimeUnit $unit): float {
            return $this->milliseconds / $unit->value;
        }
    }

    With the advent of property hooks in PHP 8.4, it might be simpler to mix the behavior and provide actual hooked properties:

    final class Time {
        private static array $map = [];
    
        public float $seconds {
            get => $this->milliseconds / 1000.0;
        }
    
        public float $minutes {
            get => $this->seconds / 60.0;
        }
    
        private static function getValue(int $milliseconds): Time {
            // Attempt to get the value from the array, and if it exists, get the
            // value from the WeakReference, otherwise, create a new one
            $realValue = (self::$map[$milliseconds] ?? null)?->get() ?? new self($milliseconds);
    
            // Store the value in the array, even if another reference exists
            self::$map[$milliseconds] = WeakReference::create($realValue);
    
            return $realValue;
        }
    
        public function __destruct(){
            // The values no longer exist, and we can delete the value from the array
            unset(self::$map[$this->milliseconds]);
        }
    
        private function __construct(public readonly int $milliseconds) {}
    
        public static function from(TimeUnit $unit, float $value): Time {
            return self::getValue($unit->value * $value);
        }
    }

    Conclusion

    I’m really excited to share this approach with you, as it enables some nice benefits when working with value objects. You no longer need special ->equals() methods and accidentally forgetting to use them. You can ask questions like assert(Time::fromSeconds(10) === Time::fromMilliseconds(10000)) and get the expected result.

    I hope you find this guide useful and can use it in your projects. If you have any questions or suggestions, please email me or post them in the comments section below.

    Playground: https://3v4l.org/dVEOP/rfc#vrfc.property-hooks

  • Classifying Pull Requests: Enhancing Code Review Effectiveness

    Classifying Pull Requests: Enhancing Code Review Effectiveness

    This entry is part 2 of 2 in the series Software Design For Normal People

    In today’s rapidly evolving software development landscape, increasingly complex systems present new challenges in code reviews. As projects grow in size and scope, efficient code review processes become essential to maintain code quality and developer productivity. By understanding and categorizing pull requests (PRs), developers can streamline reviews, improve code quality, and foster better collaboration. Research supports the idea that clear classification of PRs can lead to more efficient and productive code reviews.1 2 In this essay, we will explore the classification of pull requests into three fundamental categories—Features, Refactors, and Updates—and discuss their impact on the code review process.

    Class I: Features

    Features represent the core advancements and additions to a codebase, introducing new functionalities or modifying existing ones. These changes are crucial for software evolution and enhancing user experiences. Class I PRs are characterized by:

    1. Predominantly line additions or deletions
    2. Modifications to public interfaces such as arguments, public methods, fields, or literal interfaces.
    3. Updates or additions to tests to ensure the new or altered functionality works as intended.

    Features are the driving force behind software development, enabling products to meet new requirements and improve user satisfaction.

    Example

    Imagine a typical Feature PR involving the addition of a new API endpoint. This PR would include new code for the endpoint, updates to the interface to accommodate the new functionality, and tests to ensure the endpoint works correctly.

    Class II: Refactors

    Refactors are internal modifications aimed at improving code readability, maintainability, or performance without altering its external behavior. These changes make the codebase cleaner and more efficient. Class II PRs typically include:

    1. A mix of additions and deletions, often balancing each other out.
    2. No changes to public interfaces, ensuring the software’s external behavior remains consistent.
    3. (Usually) No need for test updates, as the functionality remains unchanged.

    Refactors are essential for maintaining a healthy codebase, making it easier for developers to understand and work with the code over time.

    Example

    A Refactor PR might focus on improving code readability by renaming variables and functions to more meaningful names, reorganizing methods for better logical flow, and removing redundant code. These changes do not alter the functionality but make the code easier to maintain.

    Class III: Updates

    Updates encompass changes to existing features, which may include bugfixes, performance improvements, or modifications to enhance existing functionality. These changes are observable and typically require updates to tests to confirm that the issue has been resolved or the improvement is effective. Class III PRs are characterized by:

    1. An almost equal number of additions and deletions.
    2. Potential changes to public interfaces to implement the necessary improvements.
    3. Necessary updates to tests to verify the changes.

    Updates are crucial for maintaining software reliability and user satisfaction, ensuring that the software performs as expected.

    Example

    Consider an Update PR that improves the performance of an existing feature causing delays under specific conditions. This PR would involve code changes to optimize the feature, possibly updates to public interfaces if necessary, and new or updated tests to ensure the improvement is effective.

    The ideal pull-request

    An ideal PR is focused and belongs to a single primary class of changes. It should be small, manageable, and dedicated to one purpose, whether it’s adding a feature, refactoring code, or updating an existing feature. Such PRs are easier to review, test, and integrate into the codebase, reducing the risk of introducing new issues.

    Challenges and mixed-class PRs

    While the ideal PR belongs to a single class, mixed-class PRs are common in larger projects. These PRs can be more challenging to review due to their complexity and the cognitive load they impose on reviewers.

    Class IV-A: Update-Refactors

    These PRs combine updates with refactoring changes. They often face criticism due to the difficulty in distinguishing between update-related changes and refactor-related changes. The lack of tests for the refactoring portion makes the diff hard to follow, complicating the review process.

    Why Problematic?

    Update-Refactor PRs blend the urgency and importance of updates with the less critical, but still valuable, refactoring tasks. This combination can obscure the main purpose of the PR, making it difficult for reviewers to prioritize their focus. The need to verify both the updates and the refactoring changes can lead to longer review times and increase the likelihood of errors slipping through. Additionally, the cognitive load increases as reviewers must switch contexts between understanding the update and assessing the refactoring efforts.

    Class IV-B: Feature-Refactors

    Similar to IV-A, these PRs include both new features and refactoring. They can be challenging to review as the introduction of new functionality is intertwined with internal code improvements, making it hard to assess each aspect independently.

    Why problematic?

    Feature-Refactor PRs can be problematic because they mix strategic, high-level changes with tactical, low-level improvements. This dual focus can dilute the reviewer’s attention, making it harder to ensure that the new feature works correctly and that the refactoring is effective. Reviewers may miss critical interactions between the new feature and the refactored code, leading to potential integration issues or unforeseen bugs.

    Class IV-C: Update-Features

    These PRs are particularly problematic as they attempt to update existing features while simultaneously adding new features. They are usually large and complex, requiring significant effort to understand the interplay between the updates and the new functionality.

    Why problematic?

    Update-Feature PRs are the most challenging because they require reviewers to juggle understanding the update, verifying the change, and evaluating the new feature. This combination can lead to confusion, as reviewers may struggle to separate the different types of changes and their impacts. The complexity of these PRs can result in longer review times, higher cognitive load, and a greater chance of introducing new bugs or missing critical issues.

    Class V: Balls of Mud

    Occasionally, a PR mixes all types of changes together—features, refactors, and updates—resulting in an unwieldy and massive diff. These PRs are the hardest to review, often leading to confusion and potential oversight of important details​

    Why problematic?

    Balls of Mud PRs are essentially a worst-case scenario in code reviews. They present an overwhelming number of changes that can obscure the purpose and rationale behind each modification. Reviewers can become bogged down in the sheer volume of changes, leading to review fatigue and the potential for important issues to be overlooked. The lack of focus in these PRs can also make it difficult to ensure that each change is adequately tested and integrated, increasing the risk of regression bugs and integration problems.

    Strategies for managing mixed-class PRs

    • Break Down PRs: Whenever possible, break down mixed-class PRs into smaller, focused PRs. This makes each PR easier to review and ensures that changes are more manageable.
    • Clear Commit Messages: Use clear and descriptive commit messages to separate concerns within a single PR. This helps reviewers understand the purpose and impact of each change.
    • Detailed Documentation: Provide detailed documentation and comments explaining the changes, especially for mixed-class PRs. This aids reviewers in understanding the rationale behind the modifications.
    • Multiple Reviewers: Involve multiple reviewers with different expertise areas to handle complex, mixed-class PRs. This can help distribute the cognitive load and ensure a more thorough review.

    Classifying pull requests into distinct categories significantly enhances the code review process by improving focus, efficiency, and collaboration, while maintaining high code quality. By aiming for focused, single-class PRs, developers can reduce review times and improve the overall quality of the codebase. However, managing mixed-class PRs remains a challenge that requires careful attention to detail and clear communication among development team members.

    written and researched with the help of A.I.

    1. Automated Code Review Comment Classification to Improve Modern Code Reviews | SpringerLink ↩︎
    2. What Are They Talking About? Analyzing Code Reviews in Pull-Based Development Model | Journal of Computer Science and Technology (springer.com) ↩︎