Complete tutor-style guide

DevOps Practices explained like a teacher guides a student

This is a practical DevOps practices guide for learners and working engineers. We will not treat DevOps as only CI/CD. We will connect Linux, networking, Git, containers, Kubernetes, OpenShift, Terraform, Ansible, Jenkins, GitOps, cloud, observability, SRE, security, AIOps and production troubleshooting into one real engineering flow.

LinuxDockerKubernetesOpenShiftTerraformAnsibleJenkinsGitOpsCloudSREAIOps

Before we start: what is DevOps really?

Student

I keep hearing DevOps everywhere. Some people say it is Jenkins. Some say Kubernetes. Some say cloud. What is DevOps in real work?

Teacher

DevOps is not one tool. DevOps is a way of building, releasing, operating and improving software with fewer handover gaps between development, operations, security and reliability teams. Tools help, but the real practice is about repeatability, automation, visibility, safety and fast recovery.

A good DevOps engineer should understand how an application moves from source code to production, how infrastructure is created, how deployments are automated, how systems are monitored, and how incidents are handled when something breaks.

Simple definition: DevOps practices help teams deliver changes safely, operate systems reliably, and recover quickly when failures happen.

The end-to-end DevOps flow

If you want to understand DevOps clearly, imagine one small application moving from a developer laptop to production.

Code is written and stored in Git.
Branches, pull requests, reviews and commit history create collaboration and traceability.
CI pipeline builds and tests the application.
Unit tests, lint checks, security scans and image builds run automatically.
Infrastructure is created using code.
Terraform provisions cloud resources such as networks, compute, managed databases or Kubernetes clusters.
Configuration is automated.
Ansible or platform automation configures servers, packages, files, users and application settings.
Application is packaged into a container.
Docker or Podman creates a repeatable image with the application and its runtime dependencies.
Application is deployed to Kubernetes or OpenShift.
Deployments, Services, Routes/Ingress, Secrets, ConfigMaps, probes and autoscaling support runtime operations.
GitOps keeps desired state under control.
Argo CD can sync Kubernetes manifests from Git and detect drift.
Observability watches the system.
Metrics, logs, traces, dashboards and alerts help engineers understand behavior.
Incidents are handled with SRE thinking.
Engineers triage, mitigate, communicate, write postmortems and improve reliability.

1. Linux and networking practices

Student

Why do you always say Linux is the base? Can I directly learn Kubernetes and cloud?

Teacher

You can start Kubernetes directly, but production troubleshooting will become difficult. Containers run on Linux. Kubernetes nodes are Linux machines in most environments. Logs, processes, ports, DNS, filesystems, permissions and system services all come back to Linux fundamentals.

Process practice

Understand how to inspect running processes, CPU usage, memory usage and service status. In production, you often start with questions like: is the process running, is it stuck, is it consuming too many resources?

ps aux | head ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head systemctl status nginx --no-pager journalctl -u nginx --since "30 minutes ago" --no-pager

Networking practice

DevOps troubleshooting often starts with connectivity. You should know how to test DNS, ports, routes, listening services and TLS behavior.

ss -tulpn curl -vk https://example.com dig example.com ip route traceroute example.com

Storage practice

Disk full issues are still common. Learn filesystem usage, inode usage, mount points and log growth.

df -h df -i du -sh /var/log/* | sort -h lsblk mount | column -t

Security practice

Linux permissions, users, groups, SSH keys and sudo access matter in every DevOps environment.

id ls -l /var/www sudo -l chmod 640 app.conf chown appuser:appgroup app.conf
Interview framing: If an application is down, do not say “I will restart it first.” A stronger answer is: “I will check service status, logs, port binding, recent changes, resource pressure, dependencies and then decide the safest mitigation.”

2. Git and collaboration practices

Student

Git looks simple: add, commit, push. What are real DevOps Git practices?

Teacher

In DevOps, Git is not only source control. Git becomes the audit trail for code, infrastructure, Kubernetes manifests, pipeline definitions and sometimes runbooks. A good Git practice makes changes reviewable and recoverable.

PracticeWhy it mattersExample
Small commitsEasier review and rollbackOne change per commit instead of mixing app, infra and pipeline changes
Pull requestsPeer review and discussionTerraform change reviewed before apply
Branch protectionPrevents accidental direct production changesMain branch requires approvals and checks
Meaningful commit messagesUseful during incident review“fix: increase readiness probe delay for model API”
Tagging releasesTrace deployment versionsv1.4.2 deployed to production
git checkout -b fix/readiness-probe git status git diff git add deployment.yaml git commit -m "fix: tune readiness probe for slow startup" git push origin fix/readiness-probe
Real practice: If infrastructure and application changes are in Git, an incident responder can check what changed recently instead of guessing blindly.

3. Container and image practices

Student

Dockerfile works on my laptop. Is that enough?

Teacher

For learning, maybe. For production, no. A good container image should be small, secure, repeatable, non-root where possible, and should separate build-time dependencies from runtime dependencies.

Good image practice

  • Use a clear base image.
  • Pin important versions where needed.
  • Do not store secrets in images.
  • Run as non-root where possible.
  • Use health checks or Kubernetes probes.
  • Keep image layers clean and small.

Bad image practice

  • Copying entire local directory blindly.
  • Running SSH inside every container.
  • Using latest tag everywhere without control.
  • Embedding passwords or tokens.
  • Installing unnecessary debugging tools in production images.
FROM python:3.12-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . RUN useradd -r appuser USER appuser EXPOSE 8080 CMD ["python", "app.py"]

4. CI/CD practices

Student

Is CI/CD just Jenkins pipeline?

Teacher

No. Jenkins is one tool. GitHub Actions, GitLab CI and other systems can also do CI/CD. The practice is to automate build, test, security checks, packaging and deployment in a controlled way.

Continuous Integration: every code change is built and tested early.
Artifact creation: create container image or package with version tags.
Quality gates: run tests, linting, dependency checks and image scans.
Deployment automation: push changes to environment with approval gates where required.
Rollback planning: every deployment should have a recovery path.

Example pipeline thinking

stages: - checkout - unit_test - build_image - scan_image - push_image - deploy_to_dev - approval - deploy_to_prod - smoke_test
Production trap: A pipeline that deploys fast but has no rollback, no smoke test, no approval and no observability is not mature automation. Speed without safety creates incidents.

5. Infrastructure as Code with Terraform

Student

Why should infrastructure be written as code? Cloud console is faster sometimes.

Teacher

Manual cloud console work may be fast once, but it is hard to repeat, review, audit and recover. Terraform turns infrastructure into version-controlled code. That means review, plan, apply, rollback strategy and environment consistency become possible.

Terraform practices

  • Use remote backend for shared state.
  • Protect state files because they may contain sensitive values.
  • Review terraform plan before apply.
  • Use modules for repeatable infrastructure.
  • Separate environments carefully.
  • Avoid manual drift from console changes.

Typical resources

  • VPC/VNet and subnets
  • Security groups / NSGs
  • Compute instances
  • Kubernetes clusters
  • Load balancers
  • Managed databases
  • IAM roles and policies
terraform init terraform fmt terraform validate terraform plan -out=tfplan terraform apply tfplan terraform state list
Interview framing: A strong answer should mention state, remote backend, locking, drift, modules, plan review, secrets handling and CI/CD integration.

6. Configuration management with Ansible

Student

If Terraform creates servers, why do we need Ansible?

Teacher

Terraform is mainly for provisioning infrastructure resources. Ansible is often used to configure operating systems, packages, services, files, users and application settings. In simple words: Terraform creates the machine; Ansible prepares the machine.

--- - name: Configure web server hosts: web become: yes tasks: - name: Install nginx package: name: nginx state: present - name: Ensure nginx is running service: name: nginx state: started enabled: yes

Good Ansible practice

Use idempotent tasks, roles, inventories, variables and clear handlers. Do not write playbooks that blindly run shell commands for everything.

Production caution

Test changes in a lower environment. Use check mode when possible. Be careful with service restarts and configuration templates.

7. Kubernetes and OpenShift practices

Student

Kubernetes has too many objects. Which practices matter most?

Teacher

Start with the runtime path: Pod, Deployment, Service, Ingress or Route, ConfigMap, Secret, PVC, probes, resources, logs and events. Then learn scheduling, RBAC, network policies, autoscaling and troubleshooting.

AreaKubernetes practiceOpenShift angle
DeploymentUse Deployments, StatefulSets, probes and controlled rolloutsUse Deployment/DeploymentConfig depending on environment standards
NetworkingUse Services and Ingress carefullyRoutes are commonly used for external access
SecurityRBAC, Secrets, network policies, non-root containersSCCs and OpenShift security defaults matter
StoragePVCs, StorageClasses, backup planningCluster storage integration and permissions are important
TroubleshootingDescribe, logs, events, rollout history, endpointsAlso check Routes, SCC, BuildConfig/ImageStream if used

CrashLoopBackOff troubleshooting example

kubectl get pods -n app kubectl describe pod payment-api-xxxxx -n app kubectl logs payment-api-xxxxx -n app --previous kubectl get events -n app --sort-by=.lastTimestamp kubectl rollout history deployment/payment-api -n app
Production rule: Do not delete Pods repeatedly without reading previous logs and events. The restart itself may hide evidence.

8. GitOps and Argo CD practices

Student

What problem does GitOps solve if I already have a pipeline?

Teacher

Traditional pipelines often push changes into clusters. GitOps changes the model: Git becomes the desired state, and tools like Argo CD continuously compare the cluster state with Git state. This helps with drift detection, auditability and controlled sync.

GitOps benefits

  • Git is the source of truth.
  • Drift is visible.
  • Rollback can be Git-based.
  • Cluster changes are reviewable.
  • Environment differences can be tracked.

GitOps cautions

  • Do not store plain secrets in Git.
  • Be careful with auto-sync in production.
  • Separate app and platform responsibilities.
  • Understand prune and self-heal behavior.
  • Review changes before production sync.

9. Cloud DevOps practices: AWS, Azure and GCP

Student

Should I learn one cloud or all clouds?

Teacher

Start with one cloud deeply. But understand cloud patterns that repeat everywhere: IAM, networking, compute, storage, managed Kubernetes, load balancing, monitoring, automation, backups and cost control. Once fundamentals are clear, moving between AWS, Azure and GCP becomes easier.

AWS practice areas

  • IAM roles and least privilege
  • VPC, subnets, route tables
  • EC2, ALB, S3, RDS
  • EKS and CloudWatch
  • Terraform automation
Practice AWS Questions →

Azure practice areas

  • Resource groups
  • VNets and NSGs
  • VMs, Storage, Azure SQL
  • AKS and Monitor
  • Identity and RBAC
Practice Azure Questions →

GCP practice areas

  • Projects and IAM
  • VPC networks
  • Compute Engine and GCS
  • GKE and Cloud Logging
  • Service accounts
Practice GCP Questions →
Cloud practice: Always connect cloud learning to real systems: DNS, TLS, load balancers, firewall rules, autoscaling, logging, backups, IAM and cost.

10. Observability and SRE practices

Student

Monitoring means dashboards, right?

Teacher

Dashboards are useful, but observability is deeper. You should be able to understand the internal state of the system using metrics, logs, traces, events and user impact signals. SRE adds reliability thinking: SLIs, SLOs, error budgets, incident response and postmortems.

SignalWhat it tells youExample
MetricsNumeric time-series behaviorCPU usage, request rate, error rate, latency
LogsDetailed event recordsApplication errors, stack traces, auth failures
TracesRequest path across servicesWhich service made a request slow
EventsSystem lifecycle changesKubernetes scheduling, pod restarts, image pull errors
SLIs/SLOsReliability targets99.9% successful requests under 300ms

Incident response practice

Confirm customer impact and severity.
Check dashboards, alerts, recent deployments and infrastructure changes.
Mitigate safely before deep root cause analysis if impact is high.
Communicate status clearly.
Write post-incident notes and prevent recurrence.

11. DevSecOps and production safety practices

Student

Security is a separate team, right? Why should DevOps learn it?

Teacher

Security teams define standards, but DevOps engineers implement many controls: IAM, secrets, network rules, image scanning, pipeline permissions, Kubernetes RBAC, TLS, audit logs and deployment approvals. If DevOps ignores security, automation can spread mistakes very fast.

Security practices to learn

  • Least privilege IAM and RBAC
  • Secrets management
  • Container image scanning
  • Dependency scanning
  • Network policies and firewall rules
  • TLS certificate handling
  • Audit logging

Unsafe practices to avoid

  • Hardcoding secrets in Git
  • Using admin credentials in pipelines
  • Running all containers as root
  • Opening broad network rules
  • Skipping approvals for production
  • Disabling security checks to deploy faster
Production warning: A fast pipeline with powerful credentials is dangerous. Treat pipeline identities like production users and restrict what they can do.

12. AIOps and AI in DevOps practices

Student

Where does AI fit in DevOps? Is it replacing engineers?

Teacher

No serious team should treat AI as a replacement for engineering judgment. AI is useful as an assistant for summarizing logs, explaining alerts, drafting incident timelines, searching runbooks and generating hypotheses. The engineer still validates everything using real signals.

Safe AI use cases

  • Summarize Linux logs
  • Explain Kubernetes events
  • Draft incident updates
  • Suggest validation commands
  • Compare alert context
  • Generate interview practice scenarios

Unsafe AI use cases without approval

  • Deleting resources
  • Changing firewall rules
  • Applying Terraform changes
  • Rotating secrets
  • Restarting production services blindly
  • Auto-remediating without guardrails

Example AI troubleshooting prompt

You are assisting with a production DevOps incident. Analyze the following logs, metrics and deployment history. Return: 1. Timeline 2. First visible failure 3. Repeated errors 4. Possible causes 5. Validation commands 6. Unsafe actions to avoid Do not assume root cause unless evidence is clear.

How all SkillUpWorks topics connect together

DevOps is not learned as isolated tools. Each topic supports the others.

SkillUpWorks topicReal production purposePractice link
LinuxOperating system, process, logs, files, servicesLinux questions
Linux NetworkingDNS, ports, routing, connectivity, TLS troubleshootingNetworking questions
Bash scriptingAutomation, checks, small operational toolsBash questions
DockerContainer images and local runtime behaviorDocker questions
KubernetesContainer orchestration and production application runtimeKubernetes questions
OpenShiftEnterprise Kubernetes platform with Routes, SCC, OperatorsOpenShift questions
TerraformInfrastructure as Code and cloud provisioningTerraform questions
AnsibleConfiguration management and automationAnsible questions
JenkinsCI/CD pipelines and release automationJenkins questions
GitOps/Argo CDGit-based Kubernetes deployment and drift controlArgo CD questions
AWS/Azure/GCPCloud infrastructure, IAM, networking, compute, managed servicesAWS / Azure / GCP
ObservabilityMetrics, logs, traces, dashboards and alertsObservability questions
SREReliability, incidents, SLOs, error budgetsSRE questions
AIOpsAI-assisted operations, alerting and troubleshootingAIOps questions

A practical DevOps project every learner should build

Student

I understand the topics separately. How do I practice them together?

Teacher

Build one end-to-end project. Do not only read. Create a small app and move it through the full DevOps lifecycle.

Create a simple web application and push it to Git.
Write a Dockerfile and run the app locally.
Create CI pipeline to test and build the image.
Provision cloud infrastructure using Terraform.
Use Ansible for any VM or server configuration.
Deploy the app to Kubernetes or OpenShift.
Expose it using Service and Ingress/Route.
Add ConfigMaps, Secrets, resource limits and probes.
Set up logs, metrics, alerts and dashboards.
Break the app intentionally and troubleshoot it.
Write incident notes and improve the design.
Practice explaining the project in an interview.

Practice DevOps the SkillUpWorks way

SkillUpWorks is built for engineers who want practical interview preparation, deep technical answers, real troubleshooting thinking, AI-assisted learning and project-based DevOps practice.

Free pages help you learn the concept. Full access helps you practice more questions, deeper answers, projects and interview scenarios.

Common DevOps mistakes beginners should avoid

Tool-first learning

Learning commands without understanding systems creates shallow knowledge. Learn why a tool exists, what problem it solves and how it fails.

Ignoring Linux basics

Kubernetes, containers and cloud still depend on operating system fundamentals. Do not skip logs, processes, DNS, ports and filesystems.

No troubleshooting practice

Only deploying happy-path labs is not enough. Break things and learn how to recover.

No security thinking

Secrets, permissions, IAM and network exposure are part of DevOps. Security cannot be an afterthought.

No observability

If you cannot see system behavior, you cannot operate it confidently.

Blind automation

Automation should have review, guardrails, rollback and logging. Bad automation can damage production faster than manual mistakes.

Interview framing: how to answer “What DevOps practices do you follow?”

Interviewer

What DevOps practices do you follow in a production environment?

Strong candidate answer

I follow practices that make delivery repeatable, visible and safe. Code and infrastructure changes should be version controlled in Git. CI pipelines should build, test, scan and package artifacts. Infrastructure should be managed using Terraform with remote state and plan review. Configuration should be automated with tools like Ansible where needed. Applications should run in containers and be deployed to Kubernetes or OpenShift with proper probes, resource limits, ConfigMaps, Secrets and rollout strategy. GitOps tools like Argo CD can maintain desired state and detect drift. Observability should include metrics, logs, traces and alerts connected to SLOs where possible. For incidents, I focus on impact, mitigation, communication, root cause analysis and post-incident improvement. I also consider security controls such as least privilege, secrets management, image scanning and approval gates. The goal is not only faster deployment, but safer and more reliable production operations.

Why this answer is strong: It connects tools to outcomes: repeatability, safety, visibility, reliability and recovery.

Suggested learning path on SkillUpWorks

Start with Linux and Linux networking.
Linux questions and networking questions.
Learn scripting and Git habits.
Bash scripting and Git/GitHub practice.
Move into Docker and Kubernetes.
Docker, Kubernetes and OpenShift.
Add automation and infrastructure.
Terraform, Ansible and cloud platforms.
Learn CI/CD and GitOps.
Jenkins and Argo CD/GitOps.
Build reliability thinking.
Observability, SRE and production troubleshooting.
Explore AI-assisted DevOps.
AI in DevOps and AIOps practice.

Ready to practice like an engineer?

Read this guide for free. Then use SkillUpWorks to practice real DevOps, Cloud, SRE and Linux interview questions with deeper answers, project flow and production troubleshooting thinking.