Software Development Best Practices

Kuba Kolodziejczyk

出身: ポーランド
大学: ロンドン大学, 大阪大学

過去
Nanyang Technological University
OIST
レキサス

現在
AI Okinawa - 代表
LiLz株式会社 - CTO
琉球大学 - 非常勤講師

Course content

Testing
Version control
Code reviews
Configuration management
Environment sandboxes
Continuous integration
Continuous deployment
Infrastructure as code

Testing

It's hard to write complex code correctly
Corner cases
Quickly notice bugs due to changes in underlying code
Refactor with confidence
Executable documentation
Leads to better design

Unit testing

get_factorial(x)
get_faces(image)

Acceptance testing

test_factorial_endpoint_with_simple_input()
Tests application starts up correctly
Tests environment - server has correctly libraries installed, etc
Tests services connections - we can connect to database, etc
Tests all software components (code, databases, external services, etc) work correctly together
Used to test "happy path" - typical interactions user has with software

Version control

Should I use git?

What is version control?
Branches
Merges
Resolving conflicts
Why using version control in personal project?
• easily revert to last known state
• don't worry about breaking something up
• check what change introduced a problem (diff)
Why using version control in a team?
• all of the above
• don't step on each other's toes (branches)
• combine code in a controlled manner (merges)
• pull requests
• code reviews
• continuous integration

Code reviews

Code review process

Developer works on code in her own branch
Once developer finished a logical piece of work, sends pull request
Code is first reviewed by automatic tests - see continuous integration
A senior engineer reviews code - reads it, highlights issues if any
If code passes review, it's merged to master. Else developer fixes any issues and sends a new pull request

Benefits of code reviews

Catches bugs before they make it to master branch
Increases codebase understanding among team members
Training for junior (and senior!) engineers - reviewer looks at your code and gives you feedback
More cohesive architecture - reviewer can enforce similar style to whole codebase

Final boss one the quest to merge your code to master - the fearsome reviewer!

Configuration management

Typical environments

Development - one per engineer
Testing - checks code works on a machine with known configuration, not just your machine
Staging - as close to production as similar, but not used by client (if you have to mess up, do it there)
Production - client facing

Parts that differ between environments

Database and other services urls
Paths to resources
Ports application uses
Logging options

Options for passing configuration to application

Command line arguments
Environment variables
Configuration file

Command line arguments

Passing configuration to application:
./myapp --database-url=... --port=123 ---loglevel=info

Using configuration inside application:

# .myapp.py
def get_arguments_parser():

    parser = argparse.ArgumentParser()
    parser.add_argument("--database-url")
    parser.add_argument("--port")
    parser.add_argument("--loglevel")

    return parser


def foo(database_url):

    print("Doing something with {}".format(database_url))


def main():

    parser = get_arguments_parser()
    arguments = parser.parse_args()

    foo(arguments["database_url"])

Disadvantages:
• a lot of typing, thus not very convenient
• easy to make a mistake
• doesn't scale

Environment variables:

Passing configuration to application:
DATABASE_URL=... PORT=123 LOGLEVEL=info ./myapp
Suffers from exactly same problems as above

OR

# .bashrc
export DATABASE_URL=...
export PORT="123"
export LOGLEVEL="info"

Then from command line:


                source ~/.bashrc

                ./myapp

Using configuration inside application:

# .myapp.py
import os

def foo():

    database_url = os.environ["DATABASE_URL"]
    print("Doing something with {}".format(database_url))


def main():

    foo()

Disadvantages:
• Configuration isn't stored in project, thus can't be version controlled
• Environment variables are essentially globals, so it's hard to see all inputs to a given piece of code

Configuration file:

Define configuration file:

# ./configurations/development_config.yaml
DATABASE_URL=...
PORT="123"
LOGLEVEL="info"

Passing configuration to application:
./myapp --config-path=./configuration/development_config.yaml

Using configuration inside application:

# .myapp.py
import yaml

def get_arguments_parser():

    parser = argparse.ArgumentParser()
    parser.add_argument("--config-path")

    return parser


def foo(database_url):

    print("Doing something with {}".format(database_url))


def main():

    parser = get_arguments_parser()
    arguments = parser.parse_args()

    with open(arguments["config-path"]) as file:

        config = yaml.loads(file)

    foo(config["database_url"])

Advantages:
• Version controlled
• Easy to add new options
• Visible in code

Sandboxing

Environment sandboxing

Environment resources
• Executables
• Compilers
• Libraries
Problems in not managing resources per project
• Can't easily reproduce environment on a different machine
• Environment update on one project changes environment definition on another project
• Environment update on one project can cause bugs on another project
Solution - environment management!
• Project keeps an environment definition file that defines all resources and their versions
• All resources are installed locally per project, rather than shared across the system
• Environment definition file is version controlled, just like any other code
• Using environment definition, exact environment can be reproduced on another machine
Sample environment definition file from Anaconda

Services sandboxing

Services used on a typical web project
• Database
• File storage
• Web server configuration
Risks in modifying shared services
• Break production
• Break development for other engineers
• Transient bugs
Solution - services sandboxes!
• Each developer uses a local instance of service per project
• Most popular tool for services sandboxing: Docker

Continuous integration

What is a continuous integration?

A systematic process for frequently merging new code into main codebase
Developers push their code to master branch often
Each push is run through automated tests and results are visible to everyone in company
Bad code is automatically rejected by integration server

What problems it tries to solve

Large and difficult merges
Developers working on old code
Bugs surfacing weeks after they were introduced

Typical stages in continuous integration pipeline

Commit stage
• Build executables and other deliverables (e.g. documentation)
• Run static code analysis
• Run unit tests
Acceptance/Integration stage
• Run tests interacting with the app (implies app has to be started for each test)
• Run database migration tests
Optional - Performance profiling stage
• Throughput stress tests
• Load stress tests
Manual - code review stage
• Lead engineer or other senior engineer reviews the code

When and where should continuous integration pipeline be triggered

Automatically on integration server on every commit to master
Automatically on integration server on every pull request
Automatically on integration server on every commit to feature branch
Manually on integration server for any of above at engineer's request
Manually on local machine at engineer's request

Popular continuous integration tools

Jenkins
GoCD
Travis CI
Quite a few others...

Continuous deployment

Release often, fail fast
Continuous deployment is continuous integration taken to its logical conclusion

Common reasons for deployments failures

It's a complex manual process and someone gets a step wrong, e.g.
• forgets to rewrite symbolic link
• forgets to update environment
• uploads wrong external resources configuration (e.g. development config instead of production config)
• doesn't link new version to correct load balancer
• etc
New code doesn't handle existing data correctly - a regression bug
Bad database refactoring

Benefits of continuous deployment

Smaller window of change == smaller chance of failure
When you do fail, you have a smaller amount of code to look at to find the problem
Quicker time to marker
Forces deployment process automation - which decreases manual process risk
If you can deploy fast, then you can recover fast too

Deployment best practices

Staging environment and production environment
• staging environment should be as similar to production as possible - real database, real load balancer, etc
• but - never use production data in staging!
Automate as much as possible
Sanity tests for production environment

Infrastructure as code

Common infrastructure tasks

Provisioning servers
• start virtual machine with correct hardware and software combination
• update third party code
• update app code
• update operating system configuration
• update test updates
Setting up virtual network
Setting up network security groups
Setting up application gateways and load balancers
Hooking up application gateways <-> load balancers <-> servers chain
Provisioning and configuring other resources
• hard drives
• cloud storage
• databases
• static IPs
• etc

Problems with manually managing infrastructure

Low bus factor
Easy for mistakes to creep in
High risk of subtle differences between resources

Benefits of automating infrastructure tasks

Repeatable and consistent
Scalable
Faster (though not fast - provisioning servers often takes several few minutes)
Code constitutes an executable documentation - never out of date
Easier to inherit, refactor and learn from

Popular infrastructure as code tool

Terraform
AWS CloudFormation/Azure Resource Manager Templates
Ansible
SaltStack
Puppet
Chef

Software development best practices

AI Okinawa

Jakub “Kuba” Kolodziejczyk