by using an
abstraction mechanism such as a
function.
Applying library code Copying and pasting is also done by experienced programmers, who often have their own libraries of well tested, ready-to-use code snippets and generic
algorithms that are easily adapted to specific tasks. Being a form of
code duplication, copy-and-paste programming has some intrinsic problems; such problems are exacerbated if the code doesn't preserve any semantic link between the source text and the copies. In this case, if changes are needed, time is wasted hunting for all the duplicate locations. (This can be partially mitigated if the original code and/or the copy are properly commented; however, even then the problem remains of making the same edits multiple times. Also, because code maintenance often omits updating the comments, comments describing where to find remote pieces of code are notorious for going out-of-date.) Adherents of
object oriented methodologies further object to the "code library" use of copy and paste. Instead of making multiple mutated copies of a generic algorithm, an object oriented approach would
abstract the algorithm into a reusable
encapsulated class. The class is written flexibly, with full support of
inheritance and
overloading, so that all calling code can be interfaced to use this generic code directly, rather than mutating the original. As additional functionality is required, the library is extended (while retaining
backward compatibility). This way, if the original algorithm has a bug to fix or can be improved, all software using it stands to benefit.
Generic programming provides additional tools to create abstractions.
Branching code Branching code is a normal part of large-team software development, allowing parallel development on both branches and hence, shorter development cycles. Classical branching has the following qualities: • Is managed by a
version control system that supports branching • Branches are re-merged once parallel development is completed. Copy and paste is a less formal alternative to classical branching, often used when it is foreseen that the branches will diverge more and more over time, as when a new product is being spun off from an existing product. As a way of spinning-off a new product, copy-and-paste programming has some advantages. Because the new development initiative does not touch the code of the existing product: • There is no need to
regression test the existing product, saving on QA time associated with the new product launch, and reducing
time to market. • There is no risk of introduced bugs in the existing product, which might upset the installed user base. The downsides are: • If the new product does not diverge as much as anticipated from the existing product, two code bases might need to be supported (at twice the cost) where one would have done. This can lead to expensive
refactoring and manual merging down the line. • The
duplicate code base doubles the time required to implement changes which may be desired across both products; this
increases time-to-market for such changes, and may, in fact, wipe out any time gains achieved by branching the code in the first place. Similar to above, the alternative to a copy-and-paste approach would be a modularized approach: • Start by factoring out code to be shared by both products into libraries. • Use those libraries (rather than a second copy of the code base) as the foundation for the development of the new product. • If an additional third, fourth, or fifth version of the product is envisaged down the line, this approach is far stronger, because the ready-made code libraries dramatically shorten the development life cycle for any additional products after the second.
Repetitive tasks or variations of a task One of the most harmful forms of copy-and-paste programming occurs in code that performs a repetitive task, or variations of the same basic task depending on some variable. Each instance is copied from above and pasted in again, with minor modifications. Harmful effects include: • The copy and paste approach often leads to large methods (a bad
code smell). • Each instance creates a code duplicate, with all the problems discussed in prior sections, but with a much greater scope. Scores of duplications are common; hundreds are possible. Bug fixes, in particular, become very difficult and costly in such code. • Such code also suffers from significant readability issues, due to the difficulty of discerning exactly what differs between each repetition. This has a direct impact on the risks and costs of revising the code. • The
procedural programming model strongly discourages the copy-and-paste approach to repetitive tasks. Under a procedural model, a preferred approach to repetitive tasks is to create a function or subroutine that performs a single pass through the task; this subroutine is then called by the parent routine, either repetitively or better yet, with some form of looping structure. Such code is termed "well decomposed", and is recommended as being easier to read and more readily extensible. • The general
rule of thumb applicable to this case is "
don't repeat yourself". == Deliberate design choice ==