Before we start: what is DevOps really?
I keep hearing DevOps everywhere. Some people say it is Jenkins. Some say Kubernetes. Some say cloud. What is DevOps in real work?
DevOps is not one tool. DevOps is a way of building, releasing, operating and improving software with fewer handover gaps between development, operations, security and reliability teams. Tools help, but the real practice is about repeatability, automation, visibility, safety and fast recovery.
A good DevOps engineer should understand how an application moves from source code to production, how infrastructure is created, how deployments are automated, how systems are monitored, and how incidents are handled when something breaks.
The end-to-end DevOps flow
If you want to understand DevOps clearly, imagine one small application moving from a developer laptop to production.
Branches, pull requests, reviews and commit history create collaboration and traceability.
Unit tests, lint checks, security scans and image builds run automatically.
Terraform provisions cloud resources such as networks, compute, managed databases or Kubernetes clusters.
Ansible or platform automation configures servers, packages, files, users and application settings.
Docker or Podman creates a repeatable image with the application and its runtime dependencies.
Deployments, Services, Routes/Ingress, Secrets, ConfigMaps, probes and autoscaling support runtime operations.
Argo CD can sync Kubernetes manifests from Git and detect drift.
Metrics, logs, traces, dashboards and alerts help engineers understand behavior.
Engineers triage, mitigate, communicate, write postmortems and improve reliability.
1. Linux and networking practices
Why do you always say Linux is the base? Can I directly learn Kubernetes and cloud?
You can start Kubernetes directly, but production troubleshooting will become difficult. Containers run on Linux. Kubernetes nodes are Linux machines in most environments. Logs, processes, ports, DNS, filesystems, permissions and system services all come back to Linux fundamentals.
Process practice
Understand how to inspect running processes, CPU usage, memory usage and service status. In production, you often start with questions like: is the process running, is it stuck, is it consuming too many resources?
Networking practice
DevOps troubleshooting often starts with connectivity. You should know how to test DNS, ports, routes, listening services and TLS behavior.
Storage practice
Disk full issues are still common. Learn filesystem usage, inode usage, mount points and log growth.
Security practice
Linux permissions, users, groups, SSH keys and sudo access matter in every DevOps environment.
2. Git and collaboration practices
Git looks simple: add, commit, push. What are real DevOps Git practices?
In DevOps, Git is not only source control. Git becomes the audit trail for code, infrastructure, Kubernetes manifests, pipeline definitions and sometimes runbooks. A good Git practice makes changes reviewable and recoverable.
| Practice | Why it matters | Example |
|---|---|---|
| Small commits | Easier review and rollback | One change per commit instead of mixing app, infra and pipeline changes |
| Pull requests | Peer review and discussion | Terraform change reviewed before apply |
| Branch protection | Prevents accidental direct production changes | Main branch requires approvals and checks |
| Meaningful commit messages | Useful during incident review | “fix: increase readiness probe delay for model API” |
| Tagging releases | Trace deployment versions | v1.4.2 deployed to production |
3. Container and image practices
Dockerfile works on my laptop. Is that enough?
For learning, maybe. For production, no. A good container image should be small, secure, repeatable, non-root where possible, and should separate build-time dependencies from runtime dependencies.
Good image practice
- Use a clear base image.
- Pin important versions where needed.
- Do not store secrets in images.
- Run as non-root where possible.
- Use health checks or Kubernetes probes.
- Keep image layers clean and small.
Bad image practice
- Copying entire local directory blindly.
- Running SSH inside every container.
- Using latest tag everywhere without control.
- Embedding passwords or tokens.
- Installing unnecessary debugging tools in production images.
4. CI/CD practices
Is CI/CD just Jenkins pipeline?
No. Jenkins is one tool. GitHub Actions, GitLab CI and other systems can also do CI/CD. The practice is to automate build, test, security checks, packaging and deployment in a controlled way.
Example pipeline thinking
5. Infrastructure as Code with Terraform
Why should infrastructure be written as code? Cloud console is faster sometimes.
Manual cloud console work may be fast once, but it is hard to repeat, review, audit and recover. Terraform turns infrastructure into version-controlled code. That means review, plan, apply, rollback strategy and environment consistency become possible.
Terraform practices
- Use remote backend for shared state.
- Protect state files because they may contain sensitive values.
- Review
terraform planbefore apply. - Use modules for repeatable infrastructure.
- Separate environments carefully.
- Avoid manual drift from console changes.
Typical resources
- VPC/VNet and subnets
- Security groups / NSGs
- Compute instances
- Kubernetes clusters
- Load balancers
- Managed databases
- IAM roles and policies
6. Configuration management with Ansible
If Terraform creates servers, why do we need Ansible?
Terraform is mainly for provisioning infrastructure resources. Ansible is often used to configure operating systems, packages, services, files, users and application settings. In simple words: Terraform creates the machine; Ansible prepares the machine.
Good Ansible practice
Use idempotent tasks, roles, inventories, variables and clear handlers. Do not write playbooks that blindly run shell commands for everything.
Production caution
Test changes in a lower environment. Use check mode when possible. Be careful with service restarts and configuration templates.
7. Kubernetes and OpenShift practices
Kubernetes has too many objects. Which practices matter most?
Start with the runtime path: Pod, Deployment, Service, Ingress or Route, ConfigMap, Secret, PVC, probes, resources, logs and events. Then learn scheduling, RBAC, network policies, autoscaling and troubleshooting.
| Area | Kubernetes practice | OpenShift angle |
|---|---|---|
| Deployment | Use Deployments, StatefulSets, probes and controlled rollouts | Use Deployment/DeploymentConfig depending on environment standards |
| Networking | Use Services and Ingress carefully | Routes are commonly used for external access |
| Security | RBAC, Secrets, network policies, non-root containers | SCCs and OpenShift security defaults matter |
| Storage | PVCs, StorageClasses, backup planning | Cluster storage integration and permissions are important |
| Troubleshooting | Describe, logs, events, rollout history, endpoints | Also check Routes, SCC, BuildConfig/ImageStream if used |
CrashLoopBackOff troubleshooting example
8. GitOps and Argo CD practices
What problem does GitOps solve if I already have a pipeline?
Traditional pipelines often push changes into clusters. GitOps changes the model: Git becomes the desired state, and tools like Argo CD continuously compare the cluster state with Git state. This helps with drift detection, auditability and controlled sync.
GitOps benefits
- Git is the source of truth.
- Drift is visible.
- Rollback can be Git-based.
- Cluster changes are reviewable.
- Environment differences can be tracked.
GitOps cautions
- Do not store plain secrets in Git.
- Be careful with auto-sync in production.
- Separate app and platform responsibilities.
- Understand prune and self-heal behavior.
- Review changes before production sync.
9. Cloud DevOps practices: AWS, Azure and GCP
Should I learn one cloud or all clouds?
Start with one cloud deeply. But understand cloud patterns that repeat everywhere: IAM, networking, compute, storage, managed Kubernetes, load balancing, monitoring, automation, backups and cost control. Once fundamentals are clear, moving between AWS, Azure and GCP becomes easier.
AWS practice areas
- IAM roles and least privilege
- VPC, subnets, route tables
- EC2, ALB, S3, RDS
- EKS and CloudWatch
- Terraform automation
Azure practice areas
- Resource groups
- VNets and NSGs
- VMs, Storage, Azure SQL
- AKS and Monitor
- Identity and RBAC
GCP practice areas
- Projects and IAM
- VPC networks
- Compute Engine and GCS
- GKE and Cloud Logging
- Service accounts
10. Observability and SRE practices
Monitoring means dashboards, right?
Dashboards are useful, but observability is deeper. You should be able to understand the internal state of the system using metrics, logs, traces, events and user impact signals. SRE adds reliability thinking: SLIs, SLOs, error budgets, incident response and postmortems.
| Signal | What it tells you | Example |
|---|---|---|
| Metrics | Numeric time-series behavior | CPU usage, request rate, error rate, latency |
| Logs | Detailed event records | Application errors, stack traces, auth failures |
| Traces | Request path across services | Which service made a request slow |
| Events | System lifecycle changes | Kubernetes scheduling, pod restarts, image pull errors |
| SLIs/SLOs | Reliability targets | 99.9% successful requests under 300ms |
Incident response practice
11. DevSecOps and production safety practices
Security is a separate team, right? Why should DevOps learn it?
Security teams define standards, but DevOps engineers implement many controls: IAM, secrets, network rules, image scanning, pipeline permissions, Kubernetes RBAC, TLS, audit logs and deployment approvals. If DevOps ignores security, automation can spread mistakes very fast.
Security practices to learn
- Least privilege IAM and RBAC
- Secrets management
- Container image scanning
- Dependency scanning
- Network policies and firewall rules
- TLS certificate handling
- Audit logging
Unsafe practices to avoid
- Hardcoding secrets in Git
- Using admin credentials in pipelines
- Running all containers as root
- Opening broad network rules
- Skipping approvals for production
- Disabling security checks to deploy faster
12. AIOps and AI in DevOps practices
Where does AI fit in DevOps? Is it replacing engineers?
No serious team should treat AI as a replacement for engineering judgment. AI is useful as an assistant for summarizing logs, explaining alerts, drafting incident timelines, searching runbooks and generating hypotheses. The engineer still validates everything using real signals.
Safe AI use cases
- Summarize Linux logs
- Explain Kubernetes events
- Draft incident updates
- Suggest validation commands
- Compare alert context
- Generate interview practice scenarios
Unsafe AI use cases without approval
- Deleting resources
- Changing firewall rules
- Applying Terraform changes
- Rotating secrets
- Restarting production services blindly
- Auto-remediating without guardrails
Example AI troubleshooting prompt
How all SkillUpWorks topics connect together
DevOps is not learned as isolated tools. Each topic supports the others.
| SkillUpWorks topic | Real production purpose | Practice link |
|---|---|---|
| Linux | Operating system, process, logs, files, services | Linux questions |
| Linux Networking | DNS, ports, routing, connectivity, TLS troubleshooting | Networking questions |
| Bash scripting | Automation, checks, small operational tools | Bash questions |
| Docker | Container images and local runtime behavior | Docker questions |
| Kubernetes | Container orchestration and production application runtime | Kubernetes questions |
| OpenShift | Enterprise Kubernetes platform with Routes, SCC, Operators | OpenShift questions |
| Terraform | Infrastructure as Code and cloud provisioning | Terraform questions |
| Ansible | Configuration management and automation | Ansible questions |
| Jenkins | CI/CD pipelines and release automation | Jenkins questions |
| GitOps/Argo CD | Git-based Kubernetes deployment and drift control | Argo CD questions |
| AWS/Azure/GCP | Cloud infrastructure, IAM, networking, compute, managed services | AWS / Azure / GCP |
| Observability | Metrics, logs, traces, dashboards and alerts | Observability questions |
| SRE | Reliability, incidents, SLOs, error budgets | SRE questions |
| AIOps | AI-assisted operations, alerting and troubleshooting | AIOps questions |
A practical DevOps project every learner should build
I understand the topics separately. How do I practice them together?
Build one end-to-end project. Do not only read. Create a small app and move it through the full DevOps lifecycle.
Practice DevOps the SkillUpWorks way
SkillUpWorks is built for engineers who want practical interview preparation, deep technical answers, real troubleshooting thinking, AI-assisted learning and project-based DevOps practice.
Free pages help you learn the concept. Full access helps you practice more questions, deeper answers, projects and interview scenarios.
Common DevOps mistakes beginners should avoid
Tool-first learning
Learning commands without understanding systems creates shallow knowledge. Learn why a tool exists, what problem it solves and how it fails.
Ignoring Linux basics
Kubernetes, containers and cloud still depend on operating system fundamentals. Do not skip logs, processes, DNS, ports and filesystems.
No troubleshooting practice
Only deploying happy-path labs is not enough. Break things and learn how to recover.
No security thinking
Secrets, permissions, IAM and network exposure are part of DevOps. Security cannot be an afterthought.
No observability
If you cannot see system behavior, you cannot operate it confidently.
Blind automation
Automation should have review, guardrails, rollback and logging. Bad automation can damage production faster than manual mistakes.
Interview framing: how to answer “What DevOps practices do you follow?”
What DevOps practices do you follow in a production environment?
I follow practices that make delivery repeatable, visible and safe. Code and infrastructure changes should be version controlled in Git. CI pipelines should build, test, scan and package artifacts. Infrastructure should be managed using Terraform with remote state and plan review. Configuration should be automated with tools like Ansible where needed. Applications should run in containers and be deployed to Kubernetes or OpenShift with proper probes, resource limits, ConfigMaps, Secrets and rollout strategy. GitOps tools like Argo CD can maintain desired state and detect drift. Observability should include metrics, logs, traces and alerts connected to SLOs where possible. For incidents, I focus on impact, mitigation, communication, root cause analysis and post-incident improvement. I also consider security controls such as least privilege, secrets management, image scanning and approval gates. The goal is not only faster deployment, but safer and more reliable production operations.
Suggested learning path on SkillUpWorks
Bash scripting and Git/GitHub practice.
Ready to practice like an engineer?
Read this guide for free. Then use SkillUpWorks to practice real DevOps, Cloud, SRE and Linux interview questions with deeper answers, project flow and production troubleshooting thinking.